TUD Logo

TUD Home » ... » Teaching » Topics for theses » Detailed topics

Operating Systems

Detailed description of the topics

Please, contact the supervisors for further details.

Miscellaneous

Analysis of Performance Variation on Modern Microprocessors (x86_64, ARM64)

With the Haswell-EP microprocessor Intel moved from constant-performance CPU designs to constant TDP with variable performance [0]. Hardware performance variation can only be measured on Light-weight kernel (LWK) or microkernels, to avoid software-induced performance variation, also known as “OS noise”. The goal of this thesis is to port and run the benchmark suite hwvar [1] to L4Re or L4Linux and to compare the measured variation with results obtained under the LWK IHK/McKernel.

Depending on the thesis (Bachelor or Master), optional components include:

  • attribute measured variation to functional units in the CPU (for example using Performance counters)
  • design new benchmarks and add them to hwvar
  • dig deep into “weird” results to explain them
  • could be done on x86_64 (Intel and/or AMD) and/or ARM

[0] Joseph Schuchart, et. al.: The Shift from Processor Power Consumption to Performance Variations: Fundamental Implications at Scale. Computer Science - Research and Development, November 2016, https://wwwpub.zih.tu-dresden.de/~jschuch/publications/enahpc16_hsw_performance_cameraready.pdf

[1] https://github.com/hannesweisbach/hwvar

Supervisor: Hannes Weisbach

Vampir for Fiasco.OC

Vampir is a tool to analyse large trace files graphically. The assignment is to develop a tool that converts the tracebuffer data generated by Fiasco.OC to a Vampir-compatible format, such as OTF2, so that scheduling and communication can be visualized by Vampir (or a similar tool). Follow-up work includes performance analysis of HPC applications.

Supervisor: Adam Lackorzynski

Reproducing Measurements

An important principle of scientific work is to check results by reproducing experiments. A publication should contain all information that is needed to achieve that. While common in other scientific disciplines, experiment reproduction is done rarely in computer science and especially in "systems". In this work, an attempt will be made to

- check assumptions commonly made in published results

- reproduce measurements.

Together with the student, we will select some prominent examples for reproduction. We suggest to start with measurements of the cost of preemption and migration in modern computer architectures with large caches and deep cache hierarchies and of interrupt latencies. And interesting side result of that work will (hopefully) be an understanding of the impact of architectural changes on systems.

Supervisor: Prof. Hermann Härtig

Scheduling

Using ATLAS to Predict Measures Other Than Execution Time

The ATLAS [1] scheduler uses metrics to predict task execution time. However, it is conceivable to use the same or a similar predictor to derive other application- and workload-related measures such as cache behavior, memory accesses, or energy consumption.

The thesis should experiment with the ATLAS predictor to analyse its ability and accuracy in predicting these other measures.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch, Hannes Weisbach

Implementing Earliest Eligible Virtual Deadline First (EEVDF) on the L4Re microkernel

Commodity operating systems schedulers rarely feature real-time support, but instead focus on throughput for servers, and fairness and responsiveness on end-user devices. Stoica et al. proposed EEVDF [1], an algorithm that allows scheduling of a real-time task set on a fair-weighted queuing scheduler.

In this thesis, an EEVDF scheduler and a suitable API should be designed and implementated either in user- or kernel space on the L4Re/Fiasco microkernel. Further, the thesis should evaluate the systems efficiency and the algorithms efficacy.

[1] Ion Stoica et al.: A Proportional Share Resource Allocation Algorithm for Real-Time, Time-Shared Systems. RTSS 1996, https://pdfs.semanticscholar.org/5b1d/784aae0d5a8c791c09a3274afc25ae9f2700.pdf

Supervisor: Michael Roitzsch, Hannes Weisbach

ATLAS Integration for Recent FFmpeg

Execution time prediction within ATLAS [1] is based on metrics. A metric is a value from the application domain, which is required to correlate with the amount of work to be performed. Roitzsch [2] proposed a set of 13 metrics to be used in the execution time prediction of decoding times of H.264-frames. These metrics were embedded as metadata in H.264-streams. An ATLAS-enabled media player would extract these metadata and pass the metrics to the ATLAS runtime. Since the original implementation, FFmpeg experienced a number of changes during its development, so that the original implementation cannot be easily re-used.

This programming-oriented thesis should implement the H.264 metrics handling in a recent version of FFmpeg and export them into an ATLAS-enabled media player (FFplay, for example). The written work should discuss tooling to analyse the FFmpeg code and how to extract the metrics.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch, Hannes Weisbach

Handle Overload Bursts in ATLAS

ATLAS [1] is a real-time scheduler for Linux that is developed at the Operating Systems chair. ATLAS simplifies real-time programming by relieving the developer from the need to specify periods, priorities or execution times.

ATLAS deals with work that did not finish within the assigned time slot by downgrading it to a lower priority. While this policy avoids real-time work to monopolise the CPU, it presents a problem during overload. When a CPU is overloaded, work might be assigned to time slots in the past. Of course those work items could never finish within their assigned slot and their priority is lowered pessimistically.

This thesis should implement and evaluate an algorithm to detect short-term overload and avoid assigning work items to the past, or fix such a schedule where work items have been assigned to time slots in the past. Both possibilities can be implemented and compared for efficiency.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch, Hannes Weisbach

Integrating Execution Time Estimation in HPC Applications

Applications in High-Performance Computing often operate in processing steps that alternate between the calculation of one simulation step and a phase of data exchange. For load balancing decisions it would be useful to know ahead of time, how long the individual processing nodes will be busy working on one iteration.

The ATLAS [1] scheduler has been developed at the Operating Systems chair as a real-time scheduler for Linux. Part of it is a machine learning component that uses linear regression analysis to predict execution times.

This thesis should apply the ATLAS execution time prediction in the HPC context. HPC applications need to communicate workload metrics to the predictor so it can learn the execution time behavior. Interfaces for the necessary information exchange should be designed and implemented to facilitate the required information exchange. An example should be demonstrated with at least one real-world HPC application.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch

End-to-end Responsiveness with the ATLAS Scheduler

ATLAS [1] is a real-time scheduler for Linux that is developed at the Operating Systems chair. ATLAS simplifies real-time programming by relieving the developer from the need to specify periods, priorities or execution times.

A major goal of ATLAS is to improve the responsiveness of graphical user interfaces. Currently, only the CPU work that is executed as a result of user interaction can be easily scheduled by ATLAS. The code path leading up to this final CPU work however is not real-time scheduled. This includes handling the keyboard, mouse, or touch-screen interrupt and the GUI framework code that dispatches this HID event to the target UI control.

This thesis should close the gap and enable end-to-end responsiveness by modifying the Linux kernel such that device interrupt handlers are ATLAS-scheduled. The user-level code that wakes up as a result of such an interrupt should also automatically run as an ATLAS job to receive real-time treatment.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch

Device Scheduling Using the ATLAS Concept

ATLAS [1] is a real-time scheduler for Linux that is developed at the Operating Systems chair. ATLAS simplifies real-time programming by relieving the developer from the need to specify periods, priorities or execution times.

ATLAS currently schedules only one resource: the CPU. The goal of this thesis is to extend the ATLAS concept towards device scheduling. For peripherals like disks or network, applications would submit work items to these devices with a timing requirement like a deadline and some notion of the amount of work to be performed. The device-specific scheduler would need to be modified to schedule jobs to meet the deadlines.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch

Cache Coloring SMP Systems

OS-Controlled cache partitioning has been proposed to limit the cache-induced interference between processes on one CPU, especially with the purpose to make cache more usable in Real-Time systems. In this work, the technique will be extended to multi processors of various architectures. What seems simple at the first view becomes much more difficult if modern cache architectures are included in the investigation. An example are smart caches which attempt to partition caches dynamically without SW influence.

Literature: OS-Controlled Cache Partitioning

Supervisor: Prof. Hermann Härtig

Time Partitions

Provide means to schedule a compartment of L4 tasks.

A set of L4 tasks, also called a compartment, needs to be scheduled in a uniform fashion. This work will build a mechanism that allows to define a CPU share for a set of tasks that will be scheduled as one group.

Supervisor: Alexander Warg, Adam Lackorzynski

User-level components

Memory Characteristics of HPC Applications

Scalability of applications is a major concern today — even more so in the field of High Performance Computing (HPC) where single applications easily span hundreds or thousands of CPUs. The question how to support such scaling by adapting applications, and possibly algorithms, is an active field of research. However, scaling across multiple machines also has significant influence on the memory consumption of the applications, e.g. due to additional communications buffers or data being held redundantly to avoid costly data transfers.

In this thesis, the memory footprints of HPC applications are to be analysed. The student will need to devise an appropriate monitoring infrastructure to capture the dynamic memory usage of processes. These measurements should have low overhead and still track the memory allocations and deallocations of various parts of the process, e.g. different libraries. Existing solutions like ScoreP [1] may provide a decent starting point. With the measurement setup in place, the memory implications of scaling HPC applications should be analysed.

[1] http://www.vi-hps.org/projects/score-p/

Supervisor: Jan Bierbaum

The Price of Reliability

Protocols like TCP provide reliable, connection-oriented data transfer over unreliable network connections. This feature is highly convenient for users, who do not need to take care of error checking, potential retransmissions and duplicate message handling. However, there is a price to be paid in the form of additional overhead. As not all applications can tolerate these overheads or do not require reliable connection in the first place, unreliable protocols such as UDP exist. In High Performance Computing (HPC), high-bandwidth, low-latency networks such as Infiniband are prevalent. These networks also offer different modes of operation and reliable connections incur additional overhead here as well.

The goal of this task is to add support for unreliable Infiniband transfers to UCX [1] and evaluate the gains (latency and bandwidth) and drawbacks (e.g. rate of lost messages) in a real system. Depending on the type of thesis (Bachelor/Beleg or Master/Diplom) this task can be expanded to include implementing/adapting existing fault-tolerant algorithms, like Corrected Gossip [2], to profit from the low overhead of unreliable data transfers and the robustness of reliable connections.

[1] http://www.openucx.org/

[2] https://htor.inf.ethz.ch/publications/img/corrected_gossip.pdf

Supervisor: Jan Bierbaum

Distributed Execution with GCD

Apple’s Grand Central Dispatch (GCD) is a programming paradigm for organizing asynchronous execution. However, it currently assumes shared memory between the concurrently executing threads. Because of its simplicity, we would like to use it for distributed applications that are currently programmed with MPI.

To this end, a distributed shared memory (DSM) infrastructure should be implemented on top of our L4 kernel. A user-level pager that implements MESI or some other coherency protocol for memory pages allows one global address space to span across a distributed system of multiple machines. A simple network protocol between the nodes is needed to manage remote execution of code.

The thesis should evaluate the work by implementing a simple numerical application with GCD and running it distributed on multiple nodes. GCD semantics can be beneficial for relaxed DSM consistency, because consistency is only required at block boundaries.

Supervisor: Michael Roitzsch, Carsten Weinhold

Last modified: 12th Jun 2018, 12.06 AM
Author: Webmaster