A New Page Table for 64-bit Address Spaces --- Talluri et al. - increasing address space size adversely affects page table an translation lookaside buffer (TLB) cost or performance; - the main idea of the paper is to show why conventional page tables are not suited to 64-bit address spaces, and define new table, called clustered page table. - clustered page tables are effective when spatial locality makes it likely that consecutive pages are in contemporaneous use; - the second contribution of the paper is to extend page tables to support superpage and subblock PTEs, used to feed the TLB. - a linear page table stores all PTEs for a process in a single array, indexed by the virtual page number (VPN). As they are very large, they reside in virtual space, using page faults to dynamically populate the table. - the extension of linear page tables to 64-bit addresses uses a virtual array with 4 thousand trillion entries and a 6-level tree. However, this design IS practical, as a portion of the TLB is reserved for mappings to the page tables and the tree is rarely traversed; - a forward-mapped page table store PTEs in n-ary trees, with each level of the tree indexed using fixed address fields in the VPN. However, 7 levels are needed to 64-bit address spaces, which the overhead not acceptable. - hashed page tables are other choices. Each PTE in the hash table, stores the mapping information for one page, a tag identifying the VPN, and a next pointer. A drawback is that the tag and the pointer are * 8 bytes each, resulting in 16 bytes of overhead for each 8 bytes of mapping information; One PTE for each physical page table. - clustered page tables are a variant of hashed page tables that store mapping information for several consecutives pages, with a single tag and next pointer. - each node in the hash table stores one tag but stores mapping information for multiple base pages that belong to the same page block - an aligned group of consecutive pages. The number of base pages in a page block is the subblock factor. - the VPN is split into a virtual page block number (VPBN) and a block offset (Boff); - shorter hash table lists (buckets); - CPT can require more space than hash tables, if the address space is really sparse, say 1 page used for each cluster; - CPT require the OS to acquire a single lock for an entire page block; this can restrict concurrent page table lookups (however, you can use reader/writer locks, allowing concurrent reads.) - to increase the hit ratio in 64-bit architectures, designers are using superpages, power-of-two multiples of base pages that are aligned in both virtual and physical memory; - when subblocking is used in the TLB, the data are is much harder (contains multiple PPNs.) One way to reduce the area is to store a single PPN and require physical pages for a PTE to be stored in a aligned block of physical memory (this is called partial-subblocking.) Valid bits make a partial-subblock TLB different from a superpage TLB, as just a subset of a subblock may be resident at a given time; - instead of having the TLB miss handler construct a superpage or partial-subblock PTE, a more efficient way would be to modify the page tables to store superpage or partial-subblock PTEs. - there are at least two solutions for supporting superpages that work for any page table: Replicate PTEs and Multiple Page Tables. The first solution stores a superpage PTE at the page table site of evey base page PTE. The drawbacks of Replicate PTEs is that we loose the space advantage of superpages in the page table and make more difficult the updating of a mapping. When using Multiple Page Tables, one page table is kept for each page size in use. - Linear Intermediate Nodes is used by aggregating a subtree in a linear page table for a superpage. Therefore, each entry in the tree can be either a pointer to the next level or a superpage PTE. - The same idea can be applied to forward-mapped page tables (Forward-Mapped Intermediate Nodes.) - another possibility is to assume a fixed superpage size and use this to index a hash page table. Among the disadvantages of this method is the problem of handling bigger superpages. - Talluri et al. conclude that the Replicate PTE method is probably the best method. - another technique for increasing the memory mapped by a TLB is complete subblocking. In this case, there are block misses and subblock misses. Subblock prefetching can be used, but increase the TLB miss penalty (how bad it is depends on the page table used.) - Talluri adds a field (S) in the clustered page table PTEs to indicate if the PTE is a subblock/superpage PTE or not. This way, the same table can have both S-PTEs and base page PTEs. "we are able to service TLB misses to both partial-subblock and base page PTEs without increasing the TLB miss penalty while using less memory for partial- subblock PTEs." - the authors define ways to handle superpages bigger than the total size of a PTE.