TUD Logo

TUD Home » ... » Teaching » Topics for theses » Detailed topics

Operating Systems

Detailed description of the topics

Please, contact the supervisors for further details.

Miscellaneous

Analyse von HPC-Leistungsunterschieden

Neuere Prozessoren liefern nicht mehr eine konstante Performance sondern arbeiten so effizient wie möglich bei festgelegter Verlustleistung. Das zieht natürlich nach sich, dass die CPU-Performance Schwankungen unterliegt. Um software-induzierte Performancevariationen („OS-noise“) auszuschließen kann hardware-induzierte Performancevariation sinnvoll nur auf sogenannten Light-weight Kernen (LWK) oder auf Mikrokernen gemessen werden. Ziel der Arbeit ist die Benchmarksuite hwvar [1], auf L4Re und/oder L4Linux laufen zu lassen und gemessene Werte mit denen unter eines LWK (IHK/McKernel) Gemessenen vergleichen.

Optionale Komponenten der Arbeit sind:

  • gemessene Variation Funktionseinheiten zuordnen (bspw über Performancecounter)
  • dazu ggf. neue Benchmarks der Suite hinzufügen
  • „komische“ Messwerte erklären

[1] https://github.com/hannesweisbach/hwvar

Supervisor: Hannes Weisbach

Vampir for Fiasco.OC

Vampir is a tool to analyse large trace files graphically. The assignment is to develop a tool that converts the tracebuffer data generated by Fiasco.OC to a Vampir-compatible format, such as OTF2, so that scheduling and communication can be visualized by Vampir (or a similar tool). Follow-up work includes performance analysis of HPC applications.

Supervisor: Adam Lackorzynski

Reproducing Measurements

An important principle of scientific work is to check results by reproducing experiments. A publication should contain all information that is needed to achieve that. While common in other scientific disciplines, experiment reproduction is done rarely in computer science and especially in "systems". In this work, an attempt will be made to

- check assumptions commonly made in published results

- reproduce measurements.

Together with the student, we will select some prominent examples for reproduction. We suggest to start with measurements of the cost of preemption and migration in modern computer architectures with large caches and deep cache hierarchies and of interrupt latencies. And interesting side result of that work will (hopefully) be an understanding of the impact of architectural changes on systems.

Supervisor: Prof. Hermann Härtig

Networking

HPC Applications on Apache Mesos

Mesos is a cloud management system for clusters of machines and it provides support for running MPI applications. The goal of this project is to setup a local Mesos cluster on machines of the OS chair and run MPI benchmark applications on it in order to compare the performance characteristics with native MPI runs.

As an extension, an analysis should be conducted of the benefits and disadvantages of a using a cloud cluster manager instead of a traditional HPC infrastructure.

Supervisor: Michael Roitzsch

Scheduling

Handle Overload Bursts in ATLAS

ATLAS [1] is a real-time scheduler for Linux that is developed at the Operating Systems chair. ATLAS simplifies real-time programming by relieving the developer from the need to specify periods, priorities or execution times.

ATLAS deals with work that did not finish within the assigned time slot by downgrading it to a lower priority. While this policy avoids real-time work to monopolise the CPU, it presents a problem during overload. When a CPU is overloaded, work might be assigned to time slots in the past. Of course those work items could never finish within their assigned slot and their priority is lowered pessimistically.

This thesis should implement and evaluate an algorithm to detect short-term overload and avoid assigning work items to the past, or fix such a schedule where work items have been assigned to time slots in the past. Both possibilities can be implemented and compared for efficiency.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch, Hannes Weisbach

Thermal and Energy Scheduling with the ATLAS Scheduler

ATLAS [1] is a real-time scheduler for Linux that is developed at the Operating Systems chair. ATLAS simplifies real-time programming by relieving the developer from the need to specify periods, priorities or execution times.

The ATLAS scheduler has rich knowledge about the application's timing needs. It could use this knowledge for advanced thermal scheduling such as delaying unneeded work to enable Intel Turbo Boost acceleration for more urgent jobs. Also, the ATLAS scheduler allows implementing a variety of scheduling policies, such as race-to-idle or consolidate-to-idle, which have varing energy characteristics.

This thesis should implement and evaluate selected scheduling policies and evaluate their energy and performance impact.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch

Integrating Execution Time Estimation in HPC Applications

Applications in High-Performance Computing often operate in processing steps that alternate between the calculation of one simulation step and a phase of data exchange. For load balancing decisions it would be useful to know ahead of time, how long the individual processing nodes will be busy working on one iteration.

The ATLAS [1] scheduler has been developed at the Operating Systems chair as a real-time scheduler for Linux. Part of it is a machine learning component that uses linear regression analysis to predict execution times.

This thesis should apply the ATLAS execution time prediction in the HPC context. HPC applications need to communicate workload metrics to the predictor so it can learn the execution time behavior. Interfaces for the necessary information exchange should be designed and implemented to facilitate the required information exchange. An example should be demonstrated with at least one real-world HPC application.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch

Device Scheduling Using the ATLAS Concept

ATLAS [1] is a real-time scheduler for Linux that is developed at the Operating Systems chair. ATLAS simplifies real-time programming by relieving the developer from the need to specify periods, priorities or execution times.

ATLAS currently schedules only one resource: the CPU. The goal of this thesis is to extend the ATLAS concept towards device scheduling. For peripherals like disks or network, applications would submit work items to these devices with a timing requirement like a deadline and some notion of the amount of work to be performed. The device-specific scheduler would need to be modified to schedule jobs to meet the deadlines.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch

Cache Coloring SMP Systems

OS-Controlled cache partitioning has been proposed to limit the cache-induced interference between processes on one CPU, especially with the purpose to make cache more usable in Real-Time systems. In this work, the technique will be extended to multi processors of various architectures. What seems simple at the first view becomes much more difficult if modern cache architectures are included in the investigation. An example are smart caches which attempt to partition caches dynamically without SW influence.

Literature: OS-Controlled Cache Partitioning

Supervisor: Prof. Hermann Härtig

Time Partitions

Provide means to schedule a compartment of L4 tasks.

A set of L4 tasks, also called a compartment, needs to be scheduled in a uniform fashion. This work will build a mechanism that allows to define a CPU share for a set of tasks that will be scheduled as one group.

Supervisor: Alexander Warg, Adam Lackorzynski

User-level components

Distributed Execution with GCD

Apple’s Grand Central Dispatch (GCD) is a programming paradigm for organizing asynchronous execution. However, it currently assumes shared memory between the concurrently executing threads. Because of its simplicity, we would like to use it for distributed applications that are currently programmed with MPI.

To this end, a distributed shared memory (DSM) infrastructure should be implemented on top of our L4 kernel. A user-level pager that implements MESI or some other coherency protocol for memory pages allows one global address space to span across a distributed system of multiple machines. A simple network protocol between the nodes is needed to manage remote execution of code.

The thesis should evaluate the work by implementing a simple numerical application with GCD and running it distributed on multiple nodes. GCD semantics can be beneficial for relaxed DSM consistency, because consistency is only required at block boundaries.

Supervisor: Michael Roitzsch, Carsten Weinhold

Last modified: 11th Jan 2018, 4.53 PM
Author: Webmaster