Managing Jobs and Processes

Determining current system load

Load average is probably the best guide to use for anticipating the amount of delay to expect when running your jobs. A load average value of 10 is definitely a heavily loaded system and you can expect delays. Of course, it’s not possible to know how long a heavily loaded system will stay that way; only experience with daily usage patterns will
help there.

The command uptime gives one line of stats for current machine, including the load
average.

    almaak.usc.edu(36): uptime
    5:25pm up 6 days, 16 mins, 27 users, load average: 4.57, 4.73, 4.52

The last three numbers are the average number of jobs in the run queue for last 1, 5, and 15 minutes.

On RCF, the command /usr/rcf/bin/rcf-load will give load averages for all RCF machines,exactly as seen when you first log in. All RCF machines are not equally usable for a given purpose, so you will have to apply that load information to your particular use.

The top command displays currently running jobs ranked in order of CPU usage and shows various stats for each, including CPU %, CPU time, resident memory size, nice value (to be explained below), load averages, etc. By default it updates every 5 seconds and shows all users.

Example, showing top niced to +19, to reduce impact on the system:

almaak.usc.edu(36): nice +19 top
last pid: 19097; load averages: 7.03, 7.12, 7.46 14:56:47
625 processes: 581 sleeping, 22 zombie, 15 stopped, 7 on cpu
CPU states: 0.0% idle, 69.1% user, 14.1% kernel, 16.7% iowait, 0.0% swap
Memory: 2005M real, 32M free, 902M swap, 2807M free swap

PID USERNAME PRI NICE SIZE RES STATE TIME WCPU  CPU COMMAND 
 19728  guanggon  -25 10  1016K 744K cpu  27.1H 12.48% 12.50% a.out 
22826 potatov  -25   10   52M 8928K cpu   138:08 12.50% 12.50% Readncm 
25370 mitaim   -25   10  912K  832K cpu   124:51 12.50% 12.50% gaussh10 
11848 poller   -25   10 7160K 3560K cpu    41:11 12.50% 12.50% matlab 
16838 shariati -25    0   17M 2576K cpu    12:42 12.50% 12.50% vfehs.out 
16793 perryros -25   11   12M   11M cpu    12:22 12.10% 11.96% sht3ell2dprmn2

Options include restricting output to one user only, to a fixed number of processes, etc. Use
<Control>-C to exit top.

Server status may also be found at Systems Status.

Running jobs sequentially

You can combine commands so that they run sequentially rather than simultaneously, thereby avoiding competing with yourself for computing cycles. Do this by stacking them with a semi-colon on the same command line:

    almaak.usc.edu(36): ls; date; whoami

Long command lines may be wrapped by continuing to type without pressing the newline key (up to 256 characters) or by typing a backslash (\) immediately before the newline and between words.

Delaying command execution

With the at command you can delay execution of commands with slightly reduced priority. This can be done either interactively from the command line or from commands contained in a file, called a shell script (see below). Mail is automatically sent by the system to the user upon completion (jobs with standard output send the results in this
mail).

The at command executes your commands at a specified later time by putting them into queue a, which has a nice value of 1 (see nicing below). You can only submit up to 4 simultaneous jobs. The following example demonstrates how to execute the commands contained in the file myscript at 5pm on Friday:

    at 5pm Friday myscript

The next example shows how to execute commands interactively from the command line (you will need the double quotes), in this case, at 3am on Sunday:

    almaak.usc.edu(36): echo "cmd -options" | at 3:00am Sunday

Lowering job priority

You can reduce the priority level of your job in several ways.

Using the nice Command

The nice command allows you to lower the priority of your command by a specific value. By default, the nice value is zero (20 under Solaris 2.5). Niceness represents a scheduling priority based on cpu usage, wait time, etc. A nice value of zero (20 under Solaris) is high priority, while a value of 19 is thelowest (39 under Solaris). Processes with a high nice value will run slower when the system is busy, but of course will run faster when fewer jobs are running. The following are some examples (system prompts are omitted for clarity; C Shell usage shown):

Solaris 2.5:

    nice <command> -options [without increment value, sets nice value to 24]
    nice +10 <command> -options [increments nice value by 10, now equals 30]
    nice +20 <command> -options [any value above 18 gets set to 39, the max]

Using the renice Command

The renice command can be used on currently running processes only to decrease priority; you cannot increase it even though you are the user who decreased it
originally. The PID number is the process id as seen with the ps command.

Solaris 2.5:

    renice 10 <pid> [increments base value of 20 to 30]

Using the batch Command

The batch command executes your commands by placing them in batch queue b, which has a nice value of 2 and a maximum number of simultaneous jobs of 2. The following example shows how to batch commands contained in the file called myscript:

    batch myscript

The next example shows how to batch commands given interactively at the command line:

almaak.usc.edu(36): batch
at> sas house1
at> sas house2
at> sas house3
at> speakez model1
at> sas house4
at> ^D

C Shell Scripts (command files)

You can place your command or series of commands into an executable file which can be “run” just as system
commands are run.

  • Create a file with any editor, as long as you save it as text only.
  • Type your commands into the file just as you would type them on the command line.
  • Make the file executable by changing the permissions (For help with this step, please see the Permissions page).

    A sample script might look like this:

    #!/bin/csh #always start with this line exactly as written
    # any line except line 1 starting with a # is just a comment
    sas mycmd #this is also a way to make a comment
    echo sas job done #send a message to the screen when the job is done