Address Translation

Dynamic (Hardware-based) Relocation

To gain some understanding of hardware-based address translation, we’ll first discuss its first incarnation: base and bounds; the technique is also referred to as dynamic relocation;

we’ll need two hardware registers within each CPU: one is called the base register, and the other the bounds (sometimes called a limit register). when any memory reference is generated by the process, it is translated by the processor in the following manner:

physical address = virtual address + base

We should note that the base and bounds registers are hardware structures kept on the chip (one pair per CPU). Sometimes people call the part of the processor that helps with address translation the memory management unit (MMU) ;

Unfortunately, this simple technique of dynamic relocation does have its inefficiencies. internal fragmentation, as the space inside the allocated unit is not all used (i.e., is fragmented) and thus wasted. Our first attempt will be a slight generalization of base and bounds known as segmentation, which we will discuss next.

Segmentation

How do we support a large address space with (potentially) a lot of free space between the stack and the heap?

Segmentation: Generalized Base/Bounds

The idea is simple: instead of having just one base and bounds pair in our MMU, why not have a base and bounds pair per logical segment of the address space? What segmentation allows the OS to do is to place each one of those segments in different parts of physical memory, and thus avoid filling physical memory with unused virtual address space.

The term segmentation fault or violation arises from a memory access on a segmented machine to an illegal address.

Which Segment Are We Referring To?

One common approach, sometimes referred to as an explicit approach, is to chop up the address space into segments based on the top few bits of the virtual address;

In the implicit approach, the hardware determines the segment by noticing how the address was formed. If, for example, the address was generated from the program counter (i.e., it was an instruction fetch), then the address is within the code segment; if the address is based off of the stack or base pointer, it must be in the stack segment; any other address must be in the heap.

What About The Stack?

it grows backwards (i.e., towards lower addresses). The first thing we need is a little extra hardware support. Instead of just base and bounds values, the hardware also needs to know which way the segment grows (a bit, for example, that is set to 1 when the segment grows in the positive direction, and 0 for negative).

To support sharing, we need a little extra support from the hardware, in the form of protection bits. Basic support adds a few bits per segment, indicating whether or not a program can read or write a segment, or perhaps execute code that lies within the segment. By setting a code segment to read-only, the same code can be shared across multiple processes, without worry of harming isolation;

Segment	Base	Size (max 4K)	Grows Positive?	Protection
Code	32K	2K	1	Read-Execute
Heap	34K	3K	1	Read-Write
Stack	28K	2K	0	Read-Write

Fine-grained vs. Coarse-grained Segmentation

Supporting many segments requires even further hardware support, with a segment table of some kind stored in memory. Such segment tables usually support the creation of a very large number of segments, and thus enable a system to use segments in more flexible ways than we have thus far discussed.

The general problem that arises is that physical memory quickly becomes full of little holes of free space, making it difficult to allocate new segments, or to grow existing ones. We call this problem external fragmentation. One can try to use smart algorithms or periodically compact memory, but the problem is fundamental and hard to avoid.

Free-Space Management

Low-level Mechanisms

Splitting and Coalescing

splitting is commonly used in allocators when requests are smaller than the size of any particular free chunk.

coalescing. Coalesce free space when a chunk of memory is freed. The idea is simple: when returning a free chunk in memory, look carefully at the addresses of the chunk you are returning as well as the nearby chunks of free space; if the newly-freed space sits right next to one (or two, as in this example) existing free chunks, merge them into a single larger free chunk.

Tracking The Size Of Allocated Regions

To accomplish this task, most allocators store a little bit of extra information in a header block which is kept in memory, usually just before the handed-out chunk of memory.

The header minimally contains the size of the allocated region; it may also contain additional pointers to speed up deallocation, a magic number to provide additional integrity checking, and other information.

typedef struct {
	int size;
	int magic;
} header_t;

when a user requests N bytes of memory, the library does not search for a free chunk of size N; rather, it searches for a free chunk of size N plus the size of the header.

Growing The Heap

Most traditional allocators start with a small-sized heap and then request more memory from the OS when they run out. Typically, this means they make some kind of system call (e.g., sbrk in most UNIX systems) to grow the heap, and then allocate the new chunks from there. To service the sbrk request, the OS finds free physical pages, maps them into the address space of the requesting process, and then returns the value of the end of the new heap.

Basic Strategies

The ideal allocator is both fast and minimizes fragmentation.

Best Fit

The best fit strategy is quite simple: first, search through the free list and find chunks of free memory that are as big or bigger than the requested size. Then, return the one that is the smallest in that group of candidates; this is the so called best-fit chunk (it could be called smallest fit too).

Worst Fit

The worst fit approach is the opposite of best fit; find the largest chunk and return the requested amount; keep the remaining (large) chunk on the free list. Once again, however, a full search of free space is required.

First Fit

The first fit method simply finds the first block that is big enough and returns the requested amount to the user. As before, the remaining free space is kept free for subsequent requests. First fit has the advantage of speed.

Next Fit

next fit algorithm keeps an extra pointer to the location within the list where one was looking last. The idea is to spread the searches for free space throughout the list more uniformly, thus avoiding splintering of the beginning of the list. The performance of such an approach is quite similar to first fit.

Other Approaches

One interesting approach that has been around for some time is the use of segregated lists. The basic idea is simple: if a particular application has one (or a few) popular-sized request that it makes, keep a separate list just to manage objects of that size; all other requests are forwarded to a more general memory allocator. Example: slab allocator.

Because coalescing is critical for an allocator, some approaches have been designed around making coalescing simple. One good example is found in the binary buddy allocator.

The beauty of buddy allocation is found in what happens when that block is freed. When returning the 8KB block to the free list, the allocator checks whether the “buddy” 8KB is free; if so, it coalesces the two blocks into a 16KB block. The allocator then checks if the buddy of the 16KB block is still free; if so, it coalesces those two blocks. This recursive coalescing process continues up the tree, either restoring the entire free space or stopping when a buddy is found to be in use.

One major problem with many of the approaches described above is their lack of scaling. Specifically, searching lists can be quite slow. Thus, advanced allocators use more complex data structures to address these costs, trading simplicity for performance. Examples include balanced binary trees, splay trees, or partially-ordered trees

Paging: Introduction

paging

It is sometimes said that the operating system takes one of two approaches when solving most any space-management problem. The first approach is to chop things up into variable-sized pieces, as we saw with segmentation in virtual memory.

It may be worth considering the second approach: to chop up space into fixed-sized pieces. In virtual memory, we call this idea paging. Instead of splitting up a process’s address space into some number of variable-sized logical segments (e.g., code, heap, stack), we divide it into fixed-sized units, each of which we call a page. Correspondingly, we view physical memory as an array of fixed-sized slots called page frames.

To record where each virtual page of the address space is placed in physical memory, the operating system usually keeps a per-process data structure known as a page table. The major role of the page table is to store address translations for each of the virtual pages of the address space.

To translate this virtual address that the process generated, we have to first split it into two components: the virtual page number (VPN), and the offset within the page. With our virtual page number, we can now index our page table and find which physical frame virtual page resides within. physical frame number (PFN) is also sometimes called the physical page number or PPN.

截屏2023-03-23 23.08.49.png

Where Are Page Tables Stored?

Because page tables are so big, we store the page table for each process in memory somewhere.

For example, imagine a typical 32-bit address space. This virtual address splits into a 20-bit VPN and 12-bit offset. A 20-bit VPN implies that there are 2^20 translations that the OS would have to manage for each process (that’s roughly a million); assuming we need 4 bytes per page table entry (PTE) to hold the physical translation plus any other useful stuff, we get an immense 4MB of memory needed for each page table! Now imagine there are 100 processes running: this means the OS would need 400MB of memory just for all those address translations!

What’s Actually In The Page Table?

The simplest form is called a linear page table, which is just an array. The OS indexes the array by the virtual page number (VPN), and looks up the page-table entry (PTE) at that index in order to find the desired physical frame number (PFN).

As for the contents of each PTE, we have a number of different bits in there worth understanding at some level.

valid bit is common to indicate whether the particular translation is valid;
protection bits, indicating whether the page could be read from, written to, or executed from.
present bit indicates whether this page is in physical memory or on disk (i.e., it has been swapped out).
reference bit (a.k.a. accessed bit) is sometimes used to track whether a page has been accessed, and is useful in determining which pages are popular and thus should be kept in memory; such knowledge is critical during page replacement.

截屏2023-03-24 10.53.53.png

Figure 18.5 shows an example page table entry from the x86 architecture. It contains a present bit § ; a read/write bit (R/W) which determines if writes are allowed to this page; a user/supervisor bit (U/S) which determines if user-mode processes can access the page; a few bits (PWT, PCD, PAT, and G) that determine how hardware caching works for these pages; an accessed bit (A) and a dirty bit (D) ; and finally, the page frame number (PFN) itself.

Paging: Also Too Slow

hardware must know where the page table is for the currently-running process. Let’s assume for now that a single page-table base register contains the physical address of the starting location of the page table. To find the location of the desired PTE, the hardware will thus perform the following functions:

VPN = (VirtualAddress & VPN_MASK) >> SHIFT

PTEAddr = PageTableBaseRegister + (VPN * sizeof(PTE))

Once this physical address is known, the hardware can fetch the PTE from memory, extract the PFN, and concatenate it with the offset from the virtual address to form the desired physical address.

offset = VirtualAddress & OFFSET_MASK

PhysAddr = (PFN << SHIFT) | offset

For every memory reference (whether an instruction fetch or an explicit load or store), paging requires us to perform one extra memory reference in order to first fetch the translation from the page table. Without careful design of both hardware and software, page tables will cause the system to run too slowly, as well as take up too much memory.

A Memory Trace

disassemble the resulting binary: using objdump on Linux, or otool on a Mac

目录CONTENT

OSTEP-notes-virtualization-chap15-to-18

Address Translation

Dynamic (Hardware-based) Relocation

Segmentation

Segmentation: Generalized Base/Bounds

Which Segment Are We Referring To?

What About The Stack?

Fine-grained vs. Coarse-grained Segmentation

Free-Space Management

Low-level Mechanisms

Splitting and Coalescing

Tracking The Size Of Allocated Regions

Growing The Heap

Basic Strategies

Best Fit

Worst Fit

First Fit

Next Fit

Other Approaches

Paging: Introduction

paging

Where Are Page Tables Stored?

What’s Actually In The Page Table?

Paging: Also Too Slow

A Memory Trace

评论区

OSTEP-notes-virtualization-chap15-to-18

Address Translation

Dynamic (Hardware-based) Relocation

Segmentation

Segmentation: Generalized Base/Bounds

Which Segment Are We Referring To?

What About The Stack?

Support for Sharing

Fine-grained vs. Coarse-grained Segmentation

Free-Space Management

Low-level Mechanisms

Splitting and Coalescing

Tracking The Size Of Allocated Regions

Growing The Heap

Basic Strategies

Best Fit

Worst Fit

First Fit

Next Fit

Other Approaches

Paging: Introduction

paging

Where Are Page Tables Stored?

What’s Actually In The Page Table?

Paging: Also Too Slow

A Memory Trace

评论区