OSTEP-notes-virtualization-chap1-to-5

文章内容大多摘录自OSTEP，配合manual和chatgpt问答进行完善。目的是覆盖操作系统一些重要且有意思的知识点，方便自己复习查阅。

官方代码仓库：https://github.com/remzi-arpacidusseau/ostep-code

一定要用vscode和终端运行、调试示例代码才能熟悉，遇到不懂的问chatgpt或者直接man然后搜索。

会带有一些我的个人主观性，并且教材这种东西也没啥人看，更别说看总结了。所以，我的OSTEP笔记直接采用全英文记录，省掉校对机器翻译的时间。

当然如果对你有帮助我也很高兴！！！毕竟我的笔记还是较全面的、不辣眼睛的，因为我自己也会经常复习🤪。

intro

the real point of education is to get you interested in something, to learn something more about the subject matter on your own and not just what you have to digest to get a good grade in some class.

The OS takes a physical resource (such as the processor, or memory, or a disk) and transforms it into a more general, powerful, and easy-to-use virtual form of itself. Thus, we sometimes refer to the operating system as a virtual machine.

Because virtualization allows many programs to run (thus sharing the CPU), and many programs to concurrently access their own instructions and data (thus sharing memory), and many programs to access devices (thus sharing disks and so forth), the OS is sometimes known as a resource manager. Each of the CPU, memory, and disk is a resource of the system.

So now you have some idea of what an OS actually does: it takes physical resources, such as a CPU, memory, or disk, and virtualizes them. It handles tough and tricky issues related to concurrency. And it stores files persistently, thus making them safe over the long-term.

One of the most basic goals is to build up some abstractions in order to make the system convenient and easy to use. Abstractions are fundamental to everything we do in computer science. Abstraction makes it possible to write a large program by dividing it into small and understandable pieces, to write such a program in a high-level language like C without thinking about assembly, to write code in assembly without thinking about logic gates

In the beginning, the operating system didn’t do too much. Basically, it was just a set of libraries of commonly-used functions; for example, instead of having each programmer of the system write low-level I/O handling code, the “OS” would provide such APIs, and thus make life easier for the developer

The key difference between a system call and a procedure call is that a system call transfers control (i.e., jumps) into the OS while simultaneously raising the hardware privilege level.

processes

The abstraction provided by the OS of a running program is something we will call a process.

Process API

Create: An operating system must include some method to create new processes. When you type a command into the shell, or double-click on an application icon, the OS is invoked to create a new process to run the program you have indicated.
Destroy: As there is an interface for process creation, systems also provide an interface to destroy processes forcefully. Of course, many processes will run and just exit by themselves when complete; when they don’t, however, the user may wish to kill them, and thus an interface to halt a runaway process is quite useful.
Wait: Sometimes it is useful to wait for a process to stop running; thus some kind of waiting interface is often provided.
Miscellaneous Control: Other than killing or waiting for a process, there are sometimes other controls that are possible. For example, most operating systems provide some kind of method to suspend a process (stop it from running for a while) and then resume it (continue it running).
Status: There are usually interfaces to get some status information about a process as well, such as how long it has run for, or what state it is in

create

The first thing that the OS must do to run a program is to load its code and any static data (e.g., initialized variables) into memory, into the address space of the process. modern OSes perform the process lazily, i.e., by loading pieces of code or data only as they are needed during program execution.

Once the code and static data are loaded into memory, there are a few other things the OS needs to do before running the process. Some memory must be allocated for the program’s run-time stack.

The OS may also allocate some memory for the program’s heap. In C programs, the heap is used for explicitly requested dynamically-allocated data; programs request such space by calling malloc() and free it explicitly by calling free().

The OS will also do some other initialization tasks, particularly as related to input/output (I/O).

one last task: to start the program running at the entry point, namely main(). By
jumping to the main() routine, the OS transfers control of the CPU to the newly-created process, and thus the program begins its execution.

states

Running: In the running state, a process is running on a processor. This means it is executing instructions.

Ready: In the ready state, a process is ready to run but for some reason the OS has chosen not to run it at this given moment

Blocked: In the blocked state, a process has performed some kind of operation that makes it not ready to run until some other event takes place. A common example: when a process initiates an I/O request to a disk, it becomes blocked and thus some other process can use the processor

截屏2023-03-20 18.43.37.png

data structures

The OS is a program, and like any program, it has some key data structures that track various relevant pieces of information. To track the state of each process, for example, the OS likely will keep some kind of process list for all processes that are ready and some additional information to track which process is currently running. The OS must also track, in some way, blocked processes; when an I/O event completes, the OS should make sure to wake the correct process and ready it to run again.

// The xv6 Proc Structure

// the registers xv6 will save and restore
// to stop and subsequently restart a process
struct context {
	int eip;
	int esp;
	int ebx;
	int ecx;
	int edx;
	int esi;
	int edi;
	int ebp;
};

// the different states a process can be in
enum proc_state { UNUSED, EMBRYO, SLEEPING,
RUNNABLE, RUNNING, ZOMBIE };

// the information xv6 tracks about each process
// including its register context and state
struct proc {
	char *mem; // Start of process memory
	uint sz; // Size of process memory
	char *kstack; // Bottom of kernel stack
	// for this process
	enum proc_state state; // Process state
	int pid; // Process ID
	struct proc *parent; // Parent process
	void *chan; // If !zero, sleeping on chan
	int killed; // If !zero, has been killed
	struct file *ofile[NOFILE]; // Open files
	struct inode *cwd; // Current directory
	struct context context; // Switch here to run process
	struct trapframe *tf; // Trap frame for the
	// current interrupt
};

Sometimes people refer to the individual structure that stores information about a process as a Process Control Block(PCB), a fancy way of talking about a C structure that contains information about each process (also sometimes called a process descriptor).

process API

UNIX presents one of the most intriguing ways to create a new process with a pair of system calls: fork() and exec(). A third routine, wait(), can be used by a process wishing to wait for a process it has created to complete.

learn to read the fucking manual!

manual sections

The number before the command in man pages represents the section of the manual where that command is documented. Here is a brief summary of the different manual sections and their contents:

Section 1: User commands. This contains the man pages for commands that can be run by regular users, such as ls or cd.

Section 2: System calls. This contains the man pages for the low-level system calls that can be used by a program to interact with the kernel or the operating system directly, such as wait() or open().

Section 3: Library functions. This contains the man pages for the library functions that can be used by a program to interact with the standard C library or other libraries, such as printf() or malloc().

Section 4: Devices and drivers. This contains the man pages for device files and drivers on the system, such as hard drives or printers.

Section 5: File formats and conventions. This contains the man pages for file formats and conventions that are used on the system, such as passwd or fstab.

Section 6: Games. This contains the man pages for games that are included on the system.

Section 7: Miscellaneous. This contains the man pages for miscellaneous subjects, such as special files, protocols, and networking standards.

Section 8: System administration commands. This contains the man pages for commands that are typically used by system administrators, such as iptables or systemctl.

You can access the manual page for a specific section by specifying its number while using the man command. For example, man 1 ls will display the manual page for the ls command in section 1.

fork()

// man 2 fork
pid_t fork(void);

use to create a child process

The fork() function is declared in the unistd.h header file and has the following syntax: The fork() function returns a process ID (pid_t) value to the parent process and 0 to the child process. If an error occurs during the fork() operation, a value of -1 is returned to the parent process.

wait()

// man 2 wait
pid_t   wait(int *) __DARWIN_ALIAS_C(wait);
pid_t   waitpid(pid_t, int *, int) __DARWIN_ALIAS_C(waitpid);

use to suspend parent process

waitpid() : This function suspends the calling process until one of its child processes terminates. It accepts parameters that specify which process to wait for, how to wait for it, and where to store information about the process that terminated.

wait() : This function is similar to waitpid() , but it waits for any child process to terminate, rather than a specific process. It also does not provide as much information about the terminated process.

Both functions are useful for managing child processes in a parent process and ensuring that the parent process does not exit before all child processes have completed their operations.

execve()

# man 2 execve
int execve(const char *path, char *const argv[], char *const envp[]);

# man 3 execvp. it's a lib func implemented using the execve syscall!!!!
int execvp(const char *file, char *const argv[]);

This system call is useful when you want to run a program that is different from the calling program.

it does not create a new process; rather, it transforms the currently running program into a different running program. After the exec() in the child, it is almost as if parent process never ran; a successful call to exec() never returns.

what’s the point of all this?

Well, as it turns out, the separation of fork() and exec() is essential in building a UNIX shell, because it lets the shell run code after the call to fork() but before the call to exec(); this code can alter the environment of the about-to-be-run program, and thus enables a variety of interesting features to be readily built.

The shell is just a user program.

It shows you a prompt and then waits for you to type something into it. You then type a command (i.e., the name of an executable program, plus any arguments) into it; in most cases, the shell then figures out where in the file system the executable resides, calls fork() to create a new child process to run the command, calls some variant of exec() to run the command, and then waits for the command to complete by calling wait(). When the child completes, the shell returns from wait() and prints out a prompt again, ready for your next command.

The separation of fork() and exec() allows the shell to do a whole bunch of useful things rather easily.

int main(int argc, char *argv[])
{
    int rc = fork();
    if (rc < 0) {
        // fork failed; exit
        fprintf(stderr, "fork failed\n");
        exit(1);
    } else if (rc == 0) {
	// child: redirect standard output to a file
        close(STDOUT_FILENO); 
        open("./p4.output", O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);

	// now exec "wc"...
        char *myargs[3];
        myargs[0] = strdup("wc");   // program: "wc" (word count)
        myargs[1] = strdup("p4.c"); // argument: file to count
        myargs[2] = NULL;           // marks end of array
        execvp(myargs[0], myargs);  // runs word count
    } else {
        // parent goes down this path (original process)
        int wc = wait(NULL);
	assert(wc >= 0);
    }
    return 0;
}

this example use fork() before exec() to do some preparation: redirecting output!

UNIX pipes are implemented in a similar way, but with the pipe() system call. In this case, the output of one process is connected to an inkernel pipe (i.e., queue), and the input of another process is connected to that same pipe; thus, the output of one process seamlessly is used as input to the next, and long and useful chains of commands can be strung together

signals

the kill() system call is used to send signals to a process, including directives to pause, die, and other useful imperatives.

For convenience, in most UNIX shells, certain keystroke combinations are configured to deliver a specific signal to the currently running process; for example, control-c sends a SIGINT (interrupt) to the process (normally terminating it) and control-z sends a SIGTSTP (stop) signal thus pausing the process in mid-execution

The entire signals subsystem provides a rich infrastructure to deliver external events to processes, including ways to receive and process those signals within individual processes, and ways to send signals to individual processes as well as entire process groups

use the signal() system call to “catch” various signals; doing so ensures that when a particular signal is delivered to a process, it will suspend its normal execution and run a particular piece of code in response to the signal.

summary

To be a master of coding, here is what you must face.

截屏2023-03-20 22.40.26.png

目录CONTENT