TUD Logo

TUD Home » ... » Teaching » Distributed Operating Systems » Memory Consistency and Cache Coherency

Operating Systems

Exercise: Memory Consistency and Cache Coherency

In the tutorial, all solutions will be presented by students. Please be prepared for all questions as the exercise will focus on discussion, not on understanding the question and gathering the knowledge.

Memory Consistency

  1. Sequential Consistency

    In a system with sequential consistency each processor always executes memory operations in the order specified by its program (program order). The order in which the individual memory operations of each processor become visible to the other processors on the shared interconnect (e.g., the bus) is called visibility order.

    Three processors (P1, P2 and P3) in a shared-memory system execute the following code (initially A = B = 0).

    P1 P2 P3
    a1: A := 1 a2: u := A a3: v := B
    b2: B := 1 b3: w := A

    Here a1 denotes the first operation of processor P1, a2 denotes the first operation of P2 and b2 denotes the second operation of P2, etc.

    The outcome of the execution, denoted by the tuple (u,v,w), may vary depending on the order in which the individual operations of each processor become globally visible. Some outcomes may not be possible on a sequentially consistent system. Complete the following table with the possible results for (u,v,w). For each row describe if the result is sequentially consistent and if so, specify a visibility order that produces the result. An example is given in the second row.

    u v w Sequentially Consistent Visibility Order
    0 0 0
    0 0 1 yes a2,a3,a1,b3,b2
    0 1 0
    0 1 1
    1 0 0
    1 0 1
    1 1 0
    1 1 1
  2. Relaxed Consistency: Peterson-Algorithm

    A well-known algorithm for mutual exclusion is Peterson's algorithm (shown in pseudo-code below). Explain why Peterson's algorithm does not break on machines with a store buffer where reads are not permitted to bypass writes to the same memory location and why it does break if reads are permitted to bypass writes to the same memory location on systems with store forwarding (e.g., SPARC TSO).

    // global variables and initial values
    bool flag0 = false, flag1 = false;
    int turn = 0;

    // Process on CPU0
    flag0 = true;
    turn = 1;
    while (turn == 1 && flag1);
    // critical section flag0 = false;
    // Process on CPU1
    flag1 = true;
    turn = 0;
    while (turn == 0 && flag0);
    // critical section
    flag1 = false;
  3. Fence Instructions

    Machines with relaxed memory consistency typically provide programmers with fence instructions to tighten the ordering of memory instructions. Insert MFENCE (memory fence) instructions in Dekker's and Peterson's algorithms to ensure their correct behavior on a multi-processor system that implements a store buffer with store forwarding. Use as few fence instructions as necessary.

Cache Coherency

Multiprocessor systems with caches use a coherency protocol, which ensures that writes by one processor eventually become visible to all other processors and that no two processors write to the same memory location simultaneously. One invalidation-based protocol discussed in the lecture is the MESI protocol.

Two processors (P1 and P2) and uniform memory are connected to a shared bus, which implements the MESI cache coherency protocol. In memory there exists a data structure with the following layout: DATA_A (8 bytes), DATA_B (24 bytes), DATA_C (8 bytes). The cacheline size is 32 bytes, so that DATA_A and DATA_B reside in cacheline X, whereas DATA_C resides in cacheline Y.

Processor P1 executes the following code:

// do something with DATA_A

Processor P2 executes the following code:

// do something with DATA_C
  1. The following table lists the memory operations of the individual processors as they appear on the shared bus. The first column lists the processor and the second column specifies the memory operation being carried out. The next two columns list the MESI state of the cachelines X and Y in each of the processors. The last two columns describe if the operation caused a cacheline transfer to/from memory or to/from another cache. Initially all cachelines are invalid (I).

    CPU Operation P1 P2 Memory Transfer Cache Transfer
    I I I I
    1 READ(DATA_A)
    1 READ(DATA_B)
    2 READ(DATA_B)
    2 READ(DATA_C)
    1 READ(DATA_B)
    2 READ(DATA_B)
  2. False Sharing

    Because DATA_B is contained in cacheline X, false sharing occurs with DATA_A. Discuss how to remedy this problem and explain the impact on memory and cache transfers.

Last modified: 9th Jun 2016, 3.44 PM
Author: Dr.-Ing. Carsten Weinhold

Michael Roitzsch

Phone: 463 42043
Fax: 463 38284
e-mail contact form

  • ModuleModules: INF-BI-1, INF-BAS4, INF-VERT4, DSE-E3
  • Credits6 Credit Points
  • 2/1/0 = 3 SWS
Time and Place

This course is not being offered in this term.