# TUD TUD Startseite » ... » Lehre » Echtzeitsysteme » Hardware

# Übung: Hardware

## Cache Coloring

For the following exercises assume the following 32-bit RISC processor architecture: 500 MHz CPU, 16 KB L1 I-Cache and 16 KB L1 D-Cache, 1 MB unified L2 Cache. The hardware page-size is 1 KB. All caches store 32 byte wide cache-lines. Cache-lines are replaced in LRU order. All caches are physically indexed and physically tagged.

1. Determine the bits of the address relevant for indexing into the cache, the bits addressing the machine word in a cache-line (offset) and the bits to be stored in the tag memory assuming the above caches are:
1. direct mapped caches
2. 4 way set associative caches
3. fully associative caches
2. How many different colors result from the values in 1.a depending on the cache and the cache architecture? How many cache-lines of the same color can be stored before replacement happens?
3. How many different colors would be available if cache coloring would be implemented on a page basis in the memory management part of the operating system?

The algorithm applied for this scheme assigns physical page-frames to virtual pages such, that data on these pages is not evicted from the caches when other non-realtime programs are running.

4. Compute the worst case execution time with and without cache-coloring for the following program performing the multiplication C = A * B where A and B are 2×2 matrixes. Assume the program will not be preempted during a single matrix multiplication and will be called multiple times. Which colors does the program occupy in the different caches and architectures?

The following program is given in RISC pseudo code, assume each basic operation takes one cycle, memory addresses take 1 cycles for L1 hits, 10 for L2 hits and 50 for L2 misses. Pipeline stalls can be neglected.

How WCET analysis changes if matrix B is stored at address 0x80002000, instead of 0x2000? Reason about a system with and without cache-coloring. How the situation is different with 4-way associative cache in comparison to direct mapped cache?

 0x1000 R1 := 0x2000 // address of A, A is stored as the array [a1,a2,a3,a4] 0x1004 R2 := 0x2100 // address of B, B is stored as the array [b1,b2,b3,b4] 0x1008 R3 := 0x2200 // address of C, C is stored as the array [c1,c2,c3,c4] 0x100c R4 := 0 // i = 0 0x1010 R5 := 0 // j = 0 0x1014 R7 := 0 // cij = 0 0x1018 R6 := 0 // k = 0 0x101c R8 := ld[R1 + R4 + R6 * 2] // load aik 0x1020 R9 := ld[R2 + R6 + R5 * 2] // load bkj 0x1024 R7 := R7 + R8 * R9 // cij += aik * bkj 0x1028 R6 += 4 // k++ 0x102c br R6 < 8 to 0x101c // test of for clause (k < 2) 0x1030 st[R3 + R4 + R5 * 2] := R7 // store cij 0x1034 R5 += 4 // j++ 0x1038 br R5 < 8 to 0x1014 // test of for clause (j < 2) 0x103c R4 += 4 // i++ 0x1040 br R4 < 8 to 0x1010 // test of for clause (i < 2)
Stand: 6.1.2020, 14:15 Uhr
Autor: M.Sc. Maksym Planeta

M.Sc.
Maksym Planeta

##### Regelungen
• Module: INF-BAS4, INF-VERT4, DSE-E9, INF-LE-EUI
• 6 Leistungspunkte
• 2/1/0 = 3 SWS
##### Zeit und Ort

Dieser Kurs wird in diesem Semester nicht angeboten.