

## Alexandria University

## **Faculty of Engineering**

**Division of Communications & Electronics** 

## CSx35 Computer Architecture Sheet 3

**1.** A cache has the following parameters: b, block size given in numbers of words; S, number of sets; N, number of ways; and A, number of address bits.

(a) In terms of the parameters described, what is the cache capacity, C?

(b) In terms of the parameters described, what is the total number of bits required to store the tags?

(c) What are S and N for a fully associative cache of capacity C words with block size b?

(d) What is S for a direct mapped cache of size C words and block size b?

**2.** A 16-word cache has the parameters given in Exercise 1. Consider the following repeating sequence of Iw addresses (given in hexadecimal):

40 44 48 4C 70 74 78 7C 80 84 88 8C 90 94 98 9C 0 4 8 C 10 14 18 1C 20

Assuming least recently used (LRU) replacement for associative caches, determine the effective miss rate if the sequence is input to the following caches, ignoring startup effects (i.e., compulsory misses

the following caches, ignoring startup effects (i.e., compulsory misses).

(a) direct mapped cache, b = 1 word

(b) fully associative cache, b = 1 word

(c) two-way set associative cache, b = 1 word

(d) direct mapped cache, b = 2 words

**3.** Repeat Exercise 2 for the following repeating sequence of lw addresses (given in hexadecimal) and cache configurations. The cache capacity is still 16 words.

74 A0 78 38C AC 84 88 8C 7C 34 38 13C 388 18C

(a) direct mapped cache, b = 1 word

(b) fully associative cache, b = 2 words

(c) two-way set associative cache, b = 2 words

(d) direct mapped cache, b = 4 words

**4.** Suppose you are running a program with the following data access pattern. The pattern is executed only once.

0x0 0x8 0x10 0x18 0x20 0x28

(a) If you use a direct mapped cache with a cache size of 1 KB and a block size of 8 bytes (2 words), how many sets are in the cache?(b) With the same cache and block size as in part (a), what is the miss rate of the direct mapped cache for the given memory access pattern?(c) For the given memory access pattern, which of the following would decrease the miss rate the most? (Cache capacity is kept constant.) Circle one.

(i) Increasing the degree of associativity to 2.

- (ii) Increasing the block size to 16 bytes.
- (iii) Either (i) or (ii).

(iv) Neither (i) nor (ii).

**5.** You are building an instruction cache for a MIPS processor. It has a total capacity of  $4C = 2^{c+2}$  bytes. It is  $N = 2^n$ -way set associative ( $N \ge 8$ ), with a block size of  $b = 2^{b'}$  bytes ( $b \ge 8$ ). Give your answers to the following questions in terms of these parameters.

(a) Which bits of the address are used to select a word within a block?(b) Which bits of the address are used to select the set within the cache?

(c) How many bits are in each tag?

(d) How many tag bits are in the entire cache?

**6.** Consider a cache with the following parameters:

N (associativity) = 2, b (block size) = 2 words, W (word size) = 32 bits, C (cache size) = 32 K words, A (address size) = 32 bits. You need to consider only word addresses.

(a) Show the tag, set, block offset, and byte offset bits of the address. State how many bits are needed for each field.

(b) What is the size of all the cache tags in bits?

(c) Suppose each cache block also has a valid bit (V) and a dirty bit (D).
What is the size of each cache set, including data, tag, and status bits?
(d) Design the cache using the building blocks in Figure 1 and a small number of two-input logic gates. The cache design must include tag storage, data storage, address comparison, data output selection, and

any other parts you feel are relevant. Note that the multiplexer and comparator blocks may be any size (n or p bits wide, respectively), but the SRAM blocks must be  $16K \times 4$  bits. Be sure to include a neatly labeled block diagram. You need only design the cache for reads.



Figure 1: Building blocks

7. You've joined a hot new Internet startup to build wrist watches with a built-in pager and Web browser. It uses an embedded processor with a multilevel cache scheme depicted in Figure 2. The processor includes a small on-chip cache in addition to a large off-chip second-level cache. (Yes, the watch weighs 3 pounds, but you should see it surf!)



Figure 2: Computer system

Assume that the processor uses 32-bit physical addresses but accesses data only on word boundaries. The caches have the characteristics

given in Table 1. The DRAM has an access time of  $t_{\rm m}$  and a size of 512 MB.

| Characteristic   | On-chip Cache            | Off-chip Cache |
|------------------|--------------------------|----------------|
| Organization     | Four-way set associative | Direct mapped  |
| Hit rate         | Α                        | В              |
| Access time      | $t_a$                    | $t_b$          |
| Block size       | 16 bytes                 | 16 bytes       |
| Number of blocks | 512                      | 256K           |

## Table 1: Memory characteristics

(a) For a given word in memory, what is the total number of locations in which it might be found in the on-chip cache and in the second-level cache?

(b) What is the size, in bits, of each tag for the on-chip cache and the second-level cache?

(c) Give an expression for the average memory read access time. The caches are accessed in sequence.

(d) Measurements show that, for a particular problem of interest, the on-chip cache hit rate is 85% and the second-level cache hit rate is 90%. However, when the on-chip cache is disabled, the second-level cache hit rate shoots up to 98.5%. Give a brief explanation of this behavior.

**8.** You are building a computer with a hierarchical memory system that consists of separate instruction and data caches followed by main memory. You are using the MIPS multicycle processor running at 1 GHz.

(a) Suppose the instruction cache is perfect (i.e., always hits) but the data cache has a 5% miss rate. On a cache miss, the processor stalls for 60 ns to access main memory, then resumes normal operation. Taking cache misses into account, what is the average memory access time?

(b) How many clock cycles per instruction (CPI) on average are required for load and store word instructions considering the non-ideal memory system? (c) Consider a benchmark application that has 25% loads, 10% stores, 11% branches, 2% jumps, and 52% R-type instructions. Taking the nonideal memory system into account, what is the average CPI for this benchmark?

(d) Now suppose that the instruction cache is also non-ideal and has a 7% miss rate. What is the average CPI for the benchmark in part (c)? Take into account both instruction and data cache misses.

**9.** Repeat Exercise 8 with the following parameters.

(a) The instruction cache is perfect (i.e., always hits) but the data cache has a 15% miss rate. On a cache miss, the processor stalls for 200 ns to access main memory, then resumes normal operation. Taking cache misses into account, what is the average memory access time?

(b) How many clock cycles per instruction (CPI) on average are required for load and store word instructions considering the non-ideal memory system?
(c) Consider a benchmark application that has 25% loads, 10% stores, 11% branches, 2% jumps, and 52% R-type instructions. Taking the non-ideal memory system into account, what is the average CPI for this benchmark?
(d) Now suppose that the instruction cache is also non-ideal and has a 10% miss rate. What is the average CPI for the benchmark in part (c)? Take into account both instruction and data cache misses.

**10.** If a computer uses 64-bit virtual addresses, how much virtual memory can it access? Note that  $2^{40}$  bytes = 1 terabyte,  $2^{50}$  bytes = 1 petabyte, and  $2^{60}$  bytes = 1 exabyte.

**11.** Consider a virtual memory system that can address a total of 2<sup>32</sup> bytes. You have unlimited hard drive space, but are limited to only 8 MB of semiconductor (physical) memory. Assume that virtual and physical pages are each 4 KB in size.

(a) How many bits is the physical address?

(b) What is the maximum number of virtual pages in the system?

(c) How many physical pages are in the system?

(d) How many bits are the virtual and physical page numbers?

(e) Suppose that you come up with a direct mapped scheme that maps virtual pages to physical pages. The mapping uses the least significant bits of the virtual page number to determine the physical page number. How many virtual pages are mapped to each physical page? Why is this "direct mapping" a bad plan?

(f) Clearly, a more flexible and dynamic scheme for translating virtual addresses into physical addresses is required than the one described in part (e). Suppose you use a page table to store mappings (translations from

virtual page number to physical page number). How many page table entries will the page table contain?

(g) Assume that, in addition to the physical page number, each page table entry also contains some status information in the form of a valid bit (V) and a dirty bit (D). How many bytes long is each page table entry? (Round up to an integer number of bytes.)

(h) Sketch the layout of the page table. What is the total size of the page table in bytes?

**12.** Consider a virtual memory system that can address a total of 2<sup>50</sup> bytes. You have unlimited hard drive space, but are limited to 2 GB of semiconductor (physical) memory. Assume that virtual and physical pages are each 4 KB in size.

(a) How many bits is the physical address?

(b) What is the maximum number of virtual pages in the system?

(c) How many physical pages are in the system?

(d) How many bits are the virtual and physical page numbers?

(e) How many page table entries will the page table contain?

(f) Assume that, in addition to the physical page number, each page table entry also contains some status information in the form of a valid bit (V) and a dirty bit (D). How many bytes long is each page table entry? (Round up to an integer number of bytes.)

(g) Sketch the layout of the page table. What is the total size of the page table in bytes?

**13.** You decide to speed up the virtual memory system of Exercise 11 by using a translation lookaside buffer (TLB). Suppose your memory system has the characteristics shown in Table 2. The TLB and cache miss rates indicate how often the requested entry is not found. The main memory miss rate indicates how often page faults occur.

| Memory Unit | Access Time (Cycles) | Miss Rate |
|-------------|----------------------|-----------|
| TLB         | 1                    | 0.05%     |
| Cache       | 1                    | 2%        |
| Main memory | 100                  | 0.0003%   |
| Hard drive  | 1,000,000            | 0%        |

Table 2: Memory characteristics

(a) What is the average memory access time of the virtual memory system before and after adding the TLB? Assume that the page table is always resident in physical memory and is never held in the data cache.(b) If the TLB has 64 entries, how big (in bits) is the TLB? Give numbers for data (physical page number), tag (virtual page number), and valid bits of each entry. Show your work clearly.

(c) Sketch the TLB. Clearly label all fields and dimensions.

(d) What size SRAM would you need to build the TLB described in part (c)? Give your answer in terms of depth × width.

**14.** The virtual memory system you are designing uses a single-level page table built from dedicated hardware (SRAM and associated logic). It supports 25-bit virtual addresses, 22-bit physical addresses, and 2<sup>16</sup>-byte (64 KB) pages. Each page table entry contains a physical page number, a valid bit (V), and a dirty bit (D).

(a) What is the total size of the page table, in bits?

(b)The operating system team proposes reducing the page size from 64 to 16 KB, but the hardware engineers on your team object on the grounds of added hardware cost. Explain their objection.

(c)The page table is to be integrated on the processor chip, along with the on-chip cache. The on-chip cache deals only with physical (not virtual) addresses. Is it possible to access the appropriate set of the on-chip cache concurrently with the page table access for a given memory access? Explain briefly the relationship that is necessary for concurrent access to the cache set and page table entry.

(d)Is it possible to perform the tag comparison in the on-chip cache concurrently with the page table access for a given memory access? Explain briefly.