Introduction#
Corresponding chapter: Chapter Eight
Experiment content: Write a simple Unix shell that supports job control
Experiment purpose: Familiarize with process control and signal handling
Experiment handout: http://csapp.cs.cmu.edu/3e/shlab.pdf
What are Unix Shells#
Wikipedia's explanation of Unix shell:
A Unix shell is a command-line interpreter or shell that provides a command line user interface for Unix-like operating systems.
According to the description in the experiment handout, Unix shell has the following characteristics:
- The shell is an interactive command-line interpreter that runs programs on behalf of the user.
- The shell repeatedly prints a prompt, waiting for the user to input a command line, and then executes the corresponding operation.
- A command line is a sequence of ASCII text words separated by whitespace characters. The first word is the name of a built-in command or the pathname of an executable file, and the remaining words are command line arguments.
- If the first word is a built-in command, the shell executes it immediately. Otherwise, the shell creates a child process and loads and runs the program in that child process. These child processes are collectively referred to as a job.
- If the command line ends with
&
, the job runs in the background, and the shell does not wait for it to finish before printing the next prompt. Otherwise, the job runs in the foreground, and the shell waits for it to finish. - The Unix shell supports job control, allowing users to move jobs between the foreground and background, as well as change the state of processes in a job (running, stopped, or terminated).
- Pressing Ctrl-C sends a SIGINT signal to the foreground job, and the default action is to terminate the process. Pressing Ctrl-Z sends a SIGTSTP signal to the foreground job, and the default action is to stop the process.
- The Unix shell provides several built-in commands to support job control, such as
jobs
,bg
,fg
,kill
, etc.
What is our tsh#
In this experiment, we need to implement a simple shell called tsh, which has the following features:
- The prompt should be the string "tsh> ".
- The command line entered by the user consists of a name and zero or more parameters, with all elements separated by one or more spaces. If the name is a built-in command, tsh should handle it immediately and wait for the next command. Otherwise, tsh should assume the name is the pathname of an executable file and load and run it in the context of the initial child process. In this case, "job" refers to this initial child process.
- tsh does not need to support pipes (|) or I/O redirection (< and >).
- Pressing Ctrl-C (Ctrl-Z) should send a SIGINT (SIGTSTP) signal to the current foreground job and all its child processes. If there is no foreground job, the signal should be ignored.
- If the command line ends with
&
, tsh should run the job in the background. Otherwise, it should run the job in the foreground. - Each job can be identified by a process ID (PID) or job ID (JID), where JID is a positive integer assigned by tsh, represented on the command line with a prefix
%
, such as "%5". - tsh should support the following built-in commands:
quit
: Terminate the shell.jobs
: List all background jobs.bg <job>
: Restart<job>
by sending a SIGCONT signal and run it in the background.<job>
can be a PID or JID.fg <job>
: Restart<job>
by sending a SIGCONT signal and run it in the foreground.<job>
can be a PID or JID.
- tsh should reap all zombie child processes. If any job terminates due to receiving a signal that it has not caught, tsh should recognize this event and print a message containing the PID of the job and a description of the signal that caused the problem.
Specifically, we need to implement the following functions based on the experimental code framework:
eval
: The main program for parsing the command line.builtin_cmd
: Recognize and run built-in commands (quit
,fg
,bg
, andjobs
).do_fgbg
: Implement the built-in commandsbg
andfg
.waitfg
: Wait for a foreground task to complete.sigchld_handler
: Handle SIGCHLD signals.sigint_handler
: Handle SIGINT (ctrl-c) signals.sigtstp_handler
: Handle SIGTSTP (ctrl-z) signals.
Next, we will complete these functions in a reasonable order and parse them.
Completing tsh#
When completing tsh, we should read the trace files provided by the official source, implement its functions one by one, and finally improve error handling statements, explaining the overall result here.
Job Management#
Don't rush; before we start completing these functions, we first need to understand the task list management tools provided by the code framework.
In the shell, the states of tasks are as follows:
- FG (foreground): Running in the foreground
- BG (background): Running in the background
- ST (stopped): Stopped
Among all tasks, only one can be in the FG state, and the conditions for the transitions between the various states of tasks are shown in the following diagram:
The data structure for tsh tasks is as follows:
struct job_t { /* The job struct */
pid_t pid; /* job PID */
int jid; /* job ID [1, 2, ...] */
int state; /* UNDEF, BG, FG, or ST */
char cmdline[MAXLINE]; /* command line */
};
The task list is stored as a global variable:
struct job_t jobs[MAXJOBS]; /* The job list */
tsh provides the following functions to manage tasks:
void clearjob(struct job_t *job);
void initjobs(struct job_t *jobs);
int maxjid(struct job_t *jobs);
int addjob(struct job_t *jobs, pid_t pid, int state, char *cmdline);
int deletejob(struct job_t *jobs, pid_t pid);
pid_t fgpid(struct job_t *jobs);
struct job_t *getjobpid(struct job_t *jobs, pid_t pid);
struct job_t *getjobjid(struct job_t *jobs, int jid);
int pid2jid(pid_t pid);
void listjobs(struct job_t *jobs);
With their names, parameters, and return values, their functions are easy to understand, and we will not explain them here.
waitfg#
The first function we need to implement is to block the current process until the pid is no longer the foreground process.
void waitfg(pid_t pid) {
while (pid == fgpid(jobs)) {
sleep(0);
}
return;
}
builtin_command#
When the user inputs a built-in command, it should be executed immediately; otherwise, it should return 0.
int builtin_cmd(char **argv) {
char *cmd = argv[0];
if (strcmp(cmd, "quit") == 0) {
exit(0); /* execute it immediately */
} else if (strcmp(cmd, "fg") == 0 || strcmp(cmd, "bg") == 0) {
do_bgfg(argv);
return 1; /* is a builtin command */
} else if (strcmp(cmd, "jobs") == 0) {
listjobs(jobs);
return 1;
}
return 0; /* not a builtin command */
}
It only needs to recognize the input arg[0]
(i.e., the program name). If it is one of the three built-in commands, it executes the corresponding function; otherwise, it returns 0.
do_bgfg#
Execute the built-in commands fg and bg.
void do_bgfg(char **argv) {
struct job_t *jobp = NULL;
if (argv[1] == NULL) {
printf("%s command requires PID or %%jobid argument\n", argv[0]);
return;
}
if (isdigit(argv[1][0])) { /* fg/bg pid */
pid_t pid = atoi(argv[1]);
if ((jobp = getjobpid(jobs, pid)) == 0) {
printf("(%d): No such process\n", pid);
return;
}
} else if (argv[1][0] == '%') { /* fg/bg %jid */
int jid = atoi(&argv[1][1]);
if ((jobp = getjobjid(jobs, jid)) == 0) {
printf("%s: No such job\n", argv[1]);
return;
}
} else {
printf("%s: argument must be a PID or %%jobid\n", argv[0]);
return;
}
if (strcmp(argv[0], "bg") == 0) {
jobp->state = BG;
kill(-jobp->pid, SIGCONT);
printf("[%d] (%d) %s", jobp->jid, jobp->pid, jobp->cmdline);
} else if (strcmp(argv[0], "fg") == 0) {
jobp->state = FG;
kill(-jobp->pid, SIGCONT);
waitfg(jobp->pid);
}
return;
}
First, let's explain that the parameter argv
is a two-dimensional pointer array, where each element is a pointer to a string. The first element argv[0]
is the name of the command, such as "fg", and the subsequent elements argv[1]
, argv[2]
, etc., are the parameters of that command, with the last element being NULL to indicate the end.
- If the user inputs fg %1, the argv array will be {"fg", "%1", NULL}.
- If the user inputs bg 2345, the argv array will be {"bg", "2345", NULL}.
There are three if statements here:
The first if statement checks whether the input parameter is empty.
The second if statement checks whether the input parameter is in pid format or %jid format and retrieves the current job pointer.
The third if statement checks whether the current command is fg or bg and sets the job's state field accordingly, sending a SIGCONT signal to the process group of the job. If it is a fg command, it also needs to wait for it to finish.
eval#
Execute the command line entered by the user.
void eval(char *cmdline) {
char *argv[MAXARGS];
int bg = parseline(cmdline, argv);
pid_t pid;
sigset_t mask, prev_mask;
if (builtin_cmd(argv) == 0) { /* not a builtin command */
// Block SIGCHLD signal
sigemptyset(&mask);
sigaddset(&mask, SIGCHLD);
sigprocmask(SIG_BLOCK, &mask, &prev_mask);
pid = fork();
if (pid == 0) {
// Unblock SIGCHLD signal in child process
sigprocmask(SIG_SETMASK, &prev_mask, NULL);
setpgid(0, 0);
if (execve(argv[0], argv, environ) < 0) {
printf("%s: Command not found\n", argv[0]);
exit(1);
}
} else {
if (!bg) {
addjob(jobs, pid, FG, cmdline);
} else {
addjob(jobs, pid, BG, cmdline);
struct job_t *job = getjobpid(jobs, pid);
printf("[%d] (%d) %s", job->jid, pid, cmdline);
}
sigprocmask(SIG_SETMASK, &prev_mask, NULL);
if (!bg) {
waitfg(pid);
}
}
}
return;
}
- Use
parseline
to parse the command line to get the argv array and bg flag. - Use
builtin_cmd
to determine if it is a built-in command and execute it directly. - If it is not a built-in command, use
fork
to create a child process that duplicates tsh. - In the child process, use
setpgid(0, 0)
to set the process group ID of the child process to its own process ID, creating a process group led by it; otherwise, the child process will default to joining the user group of the parent process. Think about what would happen when sending signals to the child process in this case. - In the parent process, add the job to the task list. If it is a foreground process, wait for it to finish; if it is a background process, print the relevant information.
- Block the SIG_CHILD signal before creating the child process, and unblock it after the parent process has finished adding the job. Think about what would happen otherwise. Also, since the child process inherits the blocking vector from the parent process, it needs to unblock the signal in the child process.
Signal Handling#
Our tsh should not terminate upon receiving SIGINT or SIGTSTP but should forward these signals to the child processes. Upon receiving SIG_CHILD, it should remove the child process from the task system. Therefore, we need to use the signal function to modify the default behavior associated with signals and replace it with our handler. When writing the handler, we need to follow some principles to ensure safety:
- G0. Keep the handler as simple as possible.
- G2. Save and restore errno.
- G3. Block all signals when accessing shared global data structures.
sigchld_handler#
The handler for when the process receives a SIGCHLD signal.
void sigchld_handler(int sig) {
int olderrno = errno;
pid_t pid;
int status;
sigset_t mask, prev_mask;
sigfillset(&mask);
while ((pid = waitpid(-1, &status, WNOHANG | WUNTRACED)) > 0) {
if (WIFEXITED(status)) {
sigprocmask(SIG_BLOCK, &mask, &prev_mask);
deletejob(jobs, pid);
sigprocmask(SIG_SETMASK, &prev_mask, NULL);
} else if (WIFSIGNALED(status)) {
sigprocmask(SIG_BLOCK, &mask, &prev_mask);
struct job_t *job = getjobpid(jobs, pid);
printf("Job [%d] (%d) terminated by signal %d\n", job->jid, pid, WTERMSIG(status));
deletejob(jobs, pid);
sigprocmask(SIG_SETMASK, &prev_mask, NULL);
} else if (WIFSTOPPED(status)) {
sigprocmask(SIG_BLOCK, &mask, &prev_mask);
struct job_t *job = getjobpid(jobs, pid);
job->state = ST;
printf("Job [%d] (%d) stopped by signal %d\n", job->jid, pid, WSTOPSIG(status));
sigprocmask(SIG_SETMASK, &prev_mask, NULL);
}
}
errno = olderrno;
return;
}
This program has the following key points:
- Save errno at the beginning and restore it at the end.
- Use
sigprocmask
to block all signals when operating on the global variable jobs, and restore the blocking vector afterward. - Use a while loop to retrieve terminated child processes, as the SIGCHLD signal may be blocked, so we need to handle as many terminated child processes as possible at once.
- We use
WIFEXITED()
,WIFSIGNALED()
, andWIFSTOPPED()
to check the state of the child process and perform the corresponding actions.
sigint_handler#
The handler for when the process receives a SIGINT signal.
void sigint_handler(int sig) {
pid_t pid;
int olderrno = errno;
sigset_t mask, prev_mask;
sigfillset(&mask);
sigprocmask(SIG_BLOCK, &mask, &prev_mask);
pid = fgpid(jobs);
sigprocmask(SIG_SETMASK, &prev_mask, NULL);
if (pid > 0) {
kill(-pid, sig);
}
errno = olderrno;
return;
}
- Note the saving and restoring of errno.
- Block all signals when operating on global variables.
sigtstp_handler#
The handler for when the process receives a SIGTSTP signal.
void sigtstp_handler(int sig) {
pid_t pid;
int olderrno = errno;
sigset_t mask, prev_mask;
sigfillset(&mask);
sigprocmask(SIG_BLOCK, &mask, &prev_mask);
pid = fgpid(jobs);
sigprocmask(SIG_SETMASK, &prev_mask, NULL);
if (pid > 0) {
kill(-pid, sig);
}
errno = olderrno;
return;
}
Similar to the previous one.
Summary#
Through this experiment, we implemented a simple shell with job control and gained a better understanding of exception control flow and signals.