====== Writing Your Own Shell ====== ---- (Adapted from linuxgazette.net/111/ramankutty.html) ===== Shells ===== When we start a Terminal or login to a UNIX system remotely via SSH or something similar, we can see the UNIX shell where we usually enter our commands to execute. The shell is a command-line interpreter that reads user input and executes commands. To execute the commands that we type in, the shell relies on the ''execve()'' system call. ===== A note on execve() ===== Briefly, ''execve'' and its family of functions helps to initiate new programs. The family consists of the following functions: execl execv execle execve execlp execvp The prototype as given in the man page for ''execve'' is the following: int execve(const char *filename, char *const argv[], char *const envp[]); The ''*filename'' parameter is the complete path name of the executable, ''argv'' and ''envp'' are the array of strings containing argument variables and environment variables respectively. If you want to see the environment variables that are set for you after login, start a Terminal and run the following command: env or set The actual kernel-level system call is ''sys_execve'' (for the ''execve'' function), and other functions in this family are just C wrapper functions around ''execve''. Now, let us write a small program using ''execve''. See the listing below: [[http://rockhopper.monmouth.edu/~jchung/cs438/fa13/src/c/shell/listing1.c|listing1.c]] Compiling and running the above program gives the output of the ''/bin/ls'' command. Now try this. Put a ''printf'' statement soon after the ''execve'' call and run the code. The ''printf'' statement will appear not to be executed at all. Can you find the explanation for this behavior in the appropriate [[https://en.wikipedia.org/wiki/Exec_%28operating_system%29|Wikipedia article on exec]]? ===== Some basics ===== Before we start writing our shell, we will look at the sequence of events that occur, from the point when the user types something at the shell to the point when they see the output of the command that they typed. Much processing needs to happen, even for a simple listing of files. - When the user hits the 'Enter' key after typing '/bin/ls', the program which runs the command (the shell) ''[[https://en.wikipedia.org/wiki/Fork_%28system_call%29|forks]]'' a new process. - This forked process invokes the ''execve'' system call for running '/bin/ls'. - The complete path, '/bin/ls' is passed as a parameter to ''execve'' along with the command line argument (''argv'') and environment variables (''envp''). - The kernel-level system call handler ''sys_execve'' checks for the existence of the file, '/bin/ls'. If the file exists, then it checks whether it is in the executable file format. If the file is in executable file format, the [[cs-438_processes_and_address_spaces?s[]=context#process_switching|execution context]] of the above forked process is altered, i.e., '/bin/ls' overlays the context of the forked process. - Finally, when the system call ''sys_execve'' terminates, '/bin/ls' is executed and the user sees the output of '/bin/ls'. ===== Getting Started ===== Let us start with some basic features of the command shell. The listing below tries to interpret the 'Enter' key being pressed by the user at the command prompt. [[http://rockhopper.monmouth.edu/~jchung/cs438/fa13/src/c/shell/listing2.c|listing2.c]] Whenever the user hits the 'Enter' key, the command prompt appears again. On running this code, if the user hits Ctrl+D, the program terminates. This is similar to how your default shell interprets Ctrl-D. When you hit Ctrl+D, you will log out of the shell, and therefore, the Terminal. Let us add another feature to interpret a Ctrl-C input. It can be done simply by registering the [[cs-438_signals_shared_memory_and_ipc#unix_signals|signal handler]] for SIGINT. And what should the signal handler do? Let us see the code in listing 3. [[http://rockhopper.monmouth.edu/~jchung/cs438/fa13/src/c/shell/listing3.c|listing3.c]] Run the program and hit Ctrl-C. What happens? You will see the command prompt again. Now try this. Remove the statement fflush(stdout) and run the program. The standard C function ''fflush'' forces the execution of the underlying write function for the standard output. Without it, we don't immediately see the command prompt after we issue a Ctrl-C. ===== Command Execution ===== Let us expand the features of our shell to execute some basic commands. Primarily we will read user inputs, check if such a command exists, and execute it. We will read the user inputs using ''getchar()''. Every character read is placed in a temporary array. The temporary array will be parsed later to frame the complete command, along with its command line options. Reading characters should go on until the user hits the 'Enter' key. This is shown in listing 4 (not meant to be compilable; for illustration purposes only). [[http://rockhopper.monmouth.edu/~jchung/cs438/fa13/src/c/shell/listing4.c|listing4.c]] Now we have the string which consists of characters that the user has typed at our command prompt. Now we have to parse it, to separate the command and the command options. To make it more clear, let us assume that the user types the following command: gcc -o hello hello.c We will then have the command line arguments as argv[0] = "gcc" argv[1] = "-o" argv[2] = "hello" argv[3] = "hello.c" Instead of using argv, we will create our own data structure (an array of strings) to store command line arguments. The listing (''listing5.c'') contains only the definition of the function ''fill_argv''. It takes the user input string as a parameter and parses it to fill the ''my_argv'' array. We distinguish the command and the command line options with intermediate blank spaces (' '). The blank space here is called a "delimiter". [[http://rockhopper.monmouth.edu/~jchung/cs438/fa13/src/c/shell/listing5.c|listing5.c]] The user input string is scanned one character at a time. Characters between the blanks are copied into the ''my_argv'' array. I have limited the number of arguments to 10, an arbitrary decision; we can have more than 10 if we want. Finally we will have the whole user input string in the array elements ''my_argv[0]'' to ''my_argv[9]''. The command will be contained in ''my_argv[0]'' and the command options (if any) will be from ''my_argv[1]'' to ''my_argv[k]'' where ''k < 9''. What next? After parsing the command and command line arguments, we have to find out if the command exists. Calls to ''execve'' will fail if the command does not exist. Note that the command passed to ''execve'' should contain the complete file system path to the command. The environment variable ''$PATH'' stores the different paths where the binaries could be present. The paths (one or more) are stored in ''$PATH'' and are separated by a colon. (Run 'echo $PATH' in a Terminal to see the value of $PATH.) These paths have to be searched to see if the command exists. The search can be avoided by use of the ''execlp'' or ''execvp'' wrapper functions, which I am trying to purposely avoid. The ''execlp'' and ''execvp'' wrapper functions do this search automatically. The listing below contains the definition of a function that checks for the existence of the command. [[http://rockhopper.monmouth.edu/~jchung/cs438/fa13/src/c/shell/listing6.c|listing6.c]] The ''attach_path'' function in Listing 6 will be called only if its parameter ''cmd'' does not have a '/' character. When the command has a '/', it means that the user is specifying a path for the command, so ''attach_path'' is not needed. So, we have: /* If cmd does not contain '/', then do attach_path. */ if(index(cmd, '/') == NULL) { attach_path(cmd); ..... } The function ''attach_path'' uses an array of strings, which is initialized with the paths defined in the environment variable ''$PATH''. This initialization is given in the listing below (''listing7.c''): [[http://rockhopper.monmouth.edu/~jchung/cs438/fa13/src/c/shell/listing7.c|listing7.c]] Listing 7 shows two functions. The function ''get_path_string'' takes the environment variable as a parameter and reads the value for the entry ''$PATH''. For example, say that we have the following value for $PATH: PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/jchung The function uses ''strstr'' from the standard C library to get the pointer to the beginning of the complete string. This is used by the function ''insert_path_str_to_search'' in Listing 7 to parse different paths and store them in a variable which is used to determine the existence of paths. There are other, more efficient methods for parsing. After the function ''attach_path'' determines the command's existence, it invokes ''execve'' for executing the command. Note that ''attach_path'' copies the complete path with the command. For example, if the user inputs 'ls', then ''attach_path'' modifies it to the complete path '/bin/ls'. This string ('/bin/ls') is then passed while calling ''execve'' along with the command line arguments (if any) and the environment variables. The listing below (listing8.c) shows this: [[http://rockhopper.monmouth.edu/~jchung/cs438/fa13/src/c/shell/listing8.c|listing8.c]] Here, ''execve'' is called by a ''forked'' child process, so that the context of the parent process (the shell) is retained. ===== Complete Code and Incompleteness ===== The listing below (listing9.c) is the complete code for our simple shell. [[http://rockhopper.monmouth.edu/~jchung/cs438/fa13/src/c/shell/listing9.c|listing9.c]] Compile it as ''myshell'' and run the code. Try to run some basic commands; it should work. Do not be surprised if 'cd' does not work. The 'cd' command and several other commands are built into other shells but not this one. This is future homework. ----