TUD Logo

TUD Home » ... » Teaching » Real-Time Systems » Hardware

Operating Systems

Exercise: Hardware

Cache Coloring

For the following exercises assume the following 32-bit RISC processor architecture: 500 MHz CPU, 16 KB L1 I-Cache and 16 KB L1 D-Cache, 1 MB unified L2 Cache. The hardware page-size is 1 KB. All caches store 32 byte wide cache-lines. Cache-lines are replaced in LRU order. All caches are physically indexed and physically tagged.

  1. Determine the bits of the address relevant for indexing into the cache, the bits addressing the machine word in a cache-line (offset) and the bits to be stored in the tag memory assuming the above caches are:
    1. direct mapped caches
    2. 4 way set associative caches
    3. fully associative caches
  2. How many different colors result from the values in 1.a depending on the cache and the cache architecture? How many cache-lines of the same color can be stored before replacement happens?
  3. How many different colors would be available if cache coloring would be implemented on a page basis in the memory management part of the operating system?

    The algorithm applied for this scheme assigns physical page-frames to virtual pages such, that data on these pages is not evicted from the caches when other non-realtime programs are running.

  4. Compute the worst case execution time with and without cache-coloring for the following program performing the multiplication C = A * B where A and B are 2×2 matrixes. Assume the program will not be preempted during a single matrix multiplication and will be called multiple times. Which colors does the program occupy in the different caches and architectures?

    The following program is given in RISC pseudo code, assume each basic operation takes one cycle, memory addresses take 1 cycles for L1 hits, 10 for L2 hits and 50 for L2 misses. Pipeline stalls can be neglected.

    How WCET analysis changes if matrix B is stored at address 0x80002000, instead of 0x2000? Reason about a system with and without cache-coloring. How the situation is different with 4-way associative cache in comparison to direct mapped cache?

    Address Statement Comment
    0x1000 R1 := 0x2000 // address of A, A is stored as the array [a1,a2,a3,a4]
    0x1004 R2 := 0x2100 // address of B, B is stored as the array [b1,b2,b3,b4]
    0x1008 R3 := 0x2200 // address of C, C is stored as the array [c1,c2,c3,c4]
    0x100c R4 := 0 // i = 0
    0x1010 R5 := 0 // j = 0
    0x1014 R7 := 0 // cij = 0
    0x1018 R6 := 0 // k = 0
    0x101c R8 := ld[R1 + R4 + R6 * 2] // load aik
    0x1020 R9 := ld[R2 + R6 + R5 * 2] // load bkj
    0x1024 R7 := R7 + R8 * R9 // cij += aik * bkj
    0x1028 R6 += 4 // k++
    0x102c br R6 < 8 to 0x101c // test of for clause (k < 2)
    0x1030 st[R3 + R4 + R5 * 2] := R7 // store cij
    0x1034 R5 += 4 // j++
    0x1038 br R5 < 8 to 0x1014 // test of for clause (j < 2)
    0x103c R4 += 4 // i++
    0x1040 br R4 < 8 to 0x1010 // test of for clause (i < 2)
Last modified: 1st Feb 2016, 6.03 PM
Author: M.Sc. Maksym Planeta

Maksym Planeta

  • ModuleModules: INF-BAS4, INF-VERT4, DSE-E9, INF-LE-EUI
  • Credits6 Credit Points
  • 2/1/0 = 3 SWS
Time and Place

This course is not being offered in this term.