Architectural Support for Translation Table Management in Large Address Space Machines --- Huck et al., 1993 - the management of page tables varies from full hardware implementations to complete software algorithms; - the Hash Page Table (HPT) is presented in the paper, and they claim 5% to 10% of improvement in simulations of over 4 billion instructions; - the instruction set provides management instructions to enable and disable the translations, change translations and control the protection mode; - the page table entry (PTE) holds protection information and status bits, usually a reference and a dirty bits; - the PTE is a superset of the information found in a TLB entry; - significant number of CPU cycles can be consumed in TLB management. Measurements show overhead values ranging between 5% and 40%; - main objectives of a VM system: - minimize the time to service TLB misses; - minimize the physical memory space to maintain the translations for the currently mapped pages; - maximize the flexibility for software to support a variety of VM mechanisms and capabilities; - computer systems may use VM systems based on hardware, software or both of them. Harware = performance, Software = flexibility; - implementations (organizations) usually optimize for a defined workload; * factors like OOP and memory-mapped files encourage more sharing, increase the size of the address space, and decrease locality; - hash tables: theory suggests sizes to be prime numbers, practice uses powers of two; - two dominant architectures: forward-mapped (usually multi-level) and inverted (reverse-mapped); - inverted page tables (IPTs) have been typically used by large address space architectures; * interesting: if lots of TLB misses occur, the cache may seem to have * a better hit rate, as there will be more accesses to the page table; Forward-mapped Page Tables: -------------------------- - most architectures with this structure allow portions of the hierarchy to be unallocated by using validity bits in the higher levels; - some architecture promote a short circuit, linking lower level entries higher in the hierarchy, when the address space is sparsely used; - generally a per-process table, with some control register pointing to the first level; - the page table itself can be referenced virtually or physically. If physically, then some table manager controls the amount of physical memory being used; - address space identifiers (ASID), or PIDs, are used to generate a global naming system, discarding the need to flush the TLB after every context-switch; - provides a flexible VM structure, good for aliasing, copy-on-write, and independent protection views; - total cost to service a miss: - overhead to suspend execution; - memory and/or cache references to each level; - possible TLB miss with virtual tables; - possible page fault on the table itself; - possible updates of the dirty and reference bits. - good for largely sequential or densely packed addresses. Bad for sparse addresses; - interesting variant is a single flat page table, index by the VPN; * the introduction of just a single additional cache miss in TLB miss handling * can greatly reduce performance; * cache miss times increasing from 10-20 cycles toward 30-70 cycles. Why? The * increasing gap between memory access time and processor speed? * TLB hit rate is also an issue. Sharing may be done by having multiples * entries in the TLB or just one for the physical address. So, there are two * bad things: sharing through multiple entries in the TLB and the lack of * PIDs, forcing the flushing of the TLB every context switch; - the space required for page tables is a function of the amount and distribution of allocated virtual memory. In the best case, all entries in the leaf pages are used; - best scenario: 1% of overhead. Worst scenario (one entry per page): 100%. Typical scenarion: 4% of overhead. - PTE size probably needs to increase to address the increasing physical memory size; Inverted Page Tables: -------------------- - it is a single table with an entry for each physical page; - to convert VPN->PPN, a hash anchor table (HAT) is used. It is indexed by a function of the virtual address and provides a pointer to a linked list of potential IPT entries; - it is a global table, shared by all processes; - the biggest difficulty is the support for address aliasing. Some systems support sharing by using a global address (same virtual address for both address spaces); * it is interesting to note that in a balanced system, each entry in the page table will be referenced a few times (will have a small chain of candidates.) This increases the possibility of a cache miss when an entry is looked up. The same idea is valid for each of the candidates; - TLB miss overhead: - CPU overhead to suspend execution; - memory reference to the HAT pointer; - memory reference to the IPT; - possible memory references for the chain elements; - possible update of the status bits. * generally these memory references generate large miss rates; - the size of the table is a linear function of the amount of physical memory; - memory mapped I/O systems can waste significant storage in an IPT if it cannot be efficiently packed; Hashed Page Table: ----------------- - combines the hash table and the IPT into a single data structure; - generates less memory references and better memory usage; - an entry (HPTE) contains both VPN and PPN; - some hash of the virtual address is used to index the table; - they basically avoid using a separate hash table; - by making the HPT very large, it acts as a complete replacement for the older hash table and IPT structure; - since the HPT entry is nearly identical to a TLB entry, it is easy to construct a hardware miss handler; ---- - TLB software miss algorithm in a forward-mapped page table: - move the faulting address to a general register; - determine if the reference is to system or user space; - move the user root table pointer to a general register; - determine the privilege level; - determine if the reference is to a different process's address space; - calculate the index into the root table; - load the root table entry; - calculate and index into the leaf page; - load PTE; - ... - the author uses some tricks to minimize the TLB miss handling time, such as aligning the table according to its size (base address is a multiple of the table size), thus substituting the addition for a simple OR, to calculate the index of a hashed value; Conclusions: ----------- - by doing a simple change in traditional IPTs, HPTs decrease the number of memory references, decreasing the number of cache misses and TLB miss handling time; - varying the participation of hardware and software, they define a native HPT and a hybrid HPT. In both of them the hardware searches only the first bucket. In a native HPT, the hardware implements the table and the software just searches the lists. In the hybrid HPTs, the hardware searches just an HPT cache, traping to software if the entry if not found. - the HPT's performance is independent of the physical memory size, the amount of allocated VM, and the sparseness of the VM. - if there are lots of TLB misses, there may be more data cache hits because the number of accesses to the page table;