IRIX Advanced Site and Server Administration Guide |
This chapter deals with
Using the accounting utilities to keep track of system use. See "Process (System) Accounting".
IRIX provides utilities to log certain types of system activity. These utilities perform process accounting.
The IRIX process accounting system can provide the following information:
the number of programs a user runs
the size and duration of user programs
data throughput (I/O)
Using this information, you can:
Determine how system resources are used and if a particular user is
using more than a reasonable share.
Trace significant system events, such as security breaches, by examining
the list of all processes invoked by a particular user at a particular
time.
Set up billing systems to charge login accounts for using system resources.
The next sections describe the parts of process accounting, how to turn on and off process accounting, and how to look at the various log files.
The IRIX process accounting system has several parts:
The IRIX kernel writes a record of each process on the system that terminates into the file /var/adm/pacct. The file contains one record per terminated process, organized according to the format defined in /usr/include/sys/acct.h.
You must specifically turn on this function. See "Turning on Process Accounting".
Once process accounting is turned on, the cron program executes several
accounting commands, as specified in /var/spool/cron/crontabs/adm
and /var/spool/crontabs/root. The commands in adm perform
monthly accounting (monacct), check the size of the pacct
file (ckpacct), and provide a daily accounting of processes and
connect time (runacct). The root crontab file runs the dodisk program,
which provides a report on current disk usage. These commands run automatically
when process accounting is turned on.
The login and init programs record connect sessions by
writing records into /etc/wtmp. This happens by default, as long
as the wtmp file exists.
Records of date changes, reboots, and shutdowns are copied from /etc/utmp
to /etc/wtmp by the acctwtmp command.
The acctwtmp utility is automatically called by runacct,
/usr/lib/acct/startacct, and /usr/lib/shutacct, once process accounting
is turned on.
The disk utilization programs acctdusg and diskusg break down disk usage by login and prepare reports. For more information on disk usage quotas, see "The quotas(4) Subsystem" in Chapter 3. These programs are run by the dodisk script.
To turn on process accounting:
Log in to the system as root.
Enter this command:
chkconfig acct on
Enter this command:
/usr/lib/acct/startup
This starts the kernel writing information into the file /var/adm/pacct.
Process accounting is started every time you boot the system, and every time the system boots, you should see a message similar to this:
System accounting started
Note that process accounting files, especially /var/adm/pacct, can grow very large. If you turn on process accounting, especially on a server, you should watch the amount of free disk space carefully. See "Controlling Accounting File Size"
To turn off process accounting, follow these steps:
Log in as root.
Enter this command:
chkconfig acct off
Enter this command:
/usr/lib/acct/shutacct
This stops the kernel from writing accounting information into the file /var/adm/pacct.
Process accounting is now turned off.
Process and disk accounting files can grow very large. On a busy system, they can grow quite rapidly.
To help keep the size of the file /var/adm/pacct under control, the cron command runs /usr/lib/acct/ckpacct to check the size of the file and the available disk space on the file system.
If the size of the pacct file exceeds 1000 blocks (by default), it runs the turnacct command with argument `` switch.'' The `` switch'' argument causes turnacct to back up the pacct file (removing any existing backup copy) and start a new, empty pacct file. This means that at any time, no more than 2000 blocks of disk space are taken by pacct file information.
If the amount of free space in the file system falls below 500 blocks, ckpacct automatically turns off process accounting by running the turnacct command with the ``off'' argument. When at least 500 blocks of disk space are free, accounting is activated again the next time cron runs ckpacct.
The directory /usr/lib/acct contains the programs and shell scripts necessary to run the accounting system. Process accounting uses a login (/var/adm) to perform certain tasks. /var/adm contains active data collection files used by the process accounting. Here is a description of the primary subdirectories in /var/adm:
/var/adm/acct/nite contains files that are reused daily by runacct.
/var/adm/acct/sum contains the cumulative summary files updated by runacct.
/var/adm/acct/fiscal contains periodic summary files created by monacct.
When IRIX enters multiuser mode, /usr/lib/acct/startup is executed as follows:
The acctwtmp program adds a ``boot'' record to /etc/wtmp.
This record is signified by using the system name as the login name in
the wtmp record.
Process accounting is started by turnacct, which, in turn, executes
acct on /var/adm/pacct.
remove is executed to clean up the saved pacct and wtmp files left in the sum directory by runacct.
The ckpacct procedure is run through cron every hour of the day to check the size of /var/adm/pacct. If the file grows past 1000 blocks (default), the turnacct switch is executed. The advantage of having several smaller pacct files becomes apparent when you try to restart runacct after a failure processing these records.
The chargefee program can be used to bill users for file restores, etc. It adds records to /var/adm/fee that are picked up and processed by the next execution of runacct and merged into the total accounting records. runacct is executed through cron each night. It processes the active accounting files, /var/adm/pacct, /etc/wtmp, /var/adm/acct/nite/disktacct, and /var/adm/fee. It produces command summaries and usage summaries by login name.
When the system is shut down using shutdown, the shutacct shell procedure is executed. It writes a shutdown reason record into /etc/wtmp and turns process accounting off.
After the first reboot each morning, the administrator should execute /usr/lib/acct/prdaily to print the previous day's accounting report.
If you have installed the system accounting option, all the files and command lines for implementation have been set up properly. You may wish to verify that the entries in the system configuration files are correct. In order to automate the operation of the accounting system, you should check that the following have been done:
The file /etc/init.d/acct should contain the following lines (among others):
/usr/lib/acct/startup /usr/lib/acct/shutacct
The first line starts process accounting during the system startup process;
the second stops it before the system is brought down.
For most installations, the following entries should be in /var/spool/cron/crontabs/adm so that cron automatically runs the daily accounting. These lines should already exist:
0 4 * * 1-6 if /etc/chkconfig acct; then /usr/lib/acct/runacct 2> /var/adm/acct/nite/fd2log; fi 5 * * * 1-6 if /etc/chkconfig acct; then /usr/lib/acct/ckpacct; fi
Note that the above cron commands appear on one line in the source file. The following command, which is also all on one line in the source file, should be in /var/spool/cron/crontabs/root:
0 2 * * 4 if /etc/chkconfig acct; then /usr/lib/acct/dodisk > /var/adm/acct/nite/disklog; fi
To facilitate monthly merging of accounting data, the following entry in /var/spool/cron/crontabs/adm allows monacct to clean up all daily reports and daily total accounting files, and deposit one monthly total report and one monthly total accounting file in the fiscal directory:
0 5 1 * * if /etc/chkconfig acct; then /usr/lib/acct/monacct; fi
The above command is all on one line in the source file, and takes advantage
of the default action of monacct that uses the current month's date
as the suffix for the file names. Notice that the entry is executed when
runacct has sufficient time to complete. This will, on the first
day of each month, create monthly accounting files with the entire month's
data.
You may wish to verify that an account exists for adm. Also, verify that the PATH shell variable is set in /var/adm/.profile to:
PATH=/usr/lib/acct:/bin:/usr/bin
To start up system accounting, simply type the commands:
chkconfig acct on
and
/usr/lib/acct/startup
The next time the system is booted, accounting will start.
runacct is the main daily accounting shell procedure. It is normally initiated by cron during nonpeak hours. runacct processes connect, fee, disk, and process accounting files. It also prepares daily and cumulative summary files for use by prdaily or for billing purposes. The following files produced by runacct are of particular interest:
runacct takes care not to damage files in the event of errors. A series of protection mechanisms are used that attempt to recognize an error, provide intelligent diagnostics, and terminate processing in such a way that runacct can be restarted with minimal intervention. It records its progress by writing descriptive messages into the file active. (Files used by runacct are assumed to be in the nite directory unless otherwise noted.) All diagnostics output during the execution of runacct are written into fd2log. runacct will complain if the files lock and lockl exist when invoked. The lastdate file contains the month and day runacct was last invoked and is used to prevent more than one execution per day. If runacct detects an error, a message is written to /dev/console, mail is sent to root and adm, locks are removed, diagnostic files are saved, and execution is terminated.
To allow runacct to be restartable, processing is broken down into separate reentrant states. A file is used to remember the last state completed. When each state completes, statefile is updated to reflect the next state. After processing for the state is complete, statefile is read and the next state is processed. When runacct reaches the CLEANUP state, it removes the locks and terminates. States are executed as follows:
The runacct procedure can fail for a variety of reasons - usually due to a system crash, /usr running out of space, or a corrupted wtmp file. If the activeMMDD file exists, check it first for error messages. If the active file and lock files exist, check fd2log for any mysterious messages. The following are error messages produced by runacct and the recommended recovery actions:
ERROR: locks found, run aborted
The files /var/adm/acct/nite/lock and /var/adm/acct/nite/lock1
were found. These files must be removed before runacct can restart.
ERROR: acctg already run for date: check /var/adm/acct/nite/lastdate
The date in lastdate and today's date are the same. Remove lastdate.
ERROR: turnacct switch returned rc=?
Check the integrity of turnacct and accton. The accton
program must be owned by root and have the setuid bit set.
ERROR: Spacct?.MMDD already exists
File setups probably already run. Check status of files, then run setups
manually.
ERROR: /var/adm/acct/nite/wtmp.MMDD already exists, run setup manually
Self-explanatory.
ERROR: wtmpfix detected a corrupted wtmp file. Use fwtmp to correct the corrupted file.
Self-explanatory.
ERROR: connect acctg failed: check /var/adm/acct/nite/log
The acctcon1 program encountered a bad wtmp file. Use
fwtmp to correct the bad file.
ERROR: Invalid state, check /var/adm/acct/nite/active
The file statefile is probably corrupted. Check statefile for irregularities and read active before restarting.
The runacct program, called without arguments, assumes that this is the first invocation of the day. The argument MMDD is necessary if runacct is being restarted and specifies the month and day for which runacct will rerun the accounting. The entry point for processing is based on the contents of statefile. To override statefile, include the desired state on the command line. For example, to start runacct, use the command:
nohup runacct 2 /var/adm/acct/nite/fd2log &
To restart runacct:
nohup runacct 0601 2 /var/adm/acct/nite/fd2log &
To restart runacct at a specific state:
nohup runacct 0601 WTMPFIX 2 /var/adm/acct/nite/fd2log &
Sometimes, errors occur in the accounting system, and a file is corrupted or lost. You can ignore some of these errors, or simply restore lost or corrupted files from a backup. However, certain files must be fixed in order to maintain the integrity of the accounting system.
The wtmp files are the most delicate part of the accounting system. When the date is changed and the IRIX system is in multiuser mode, a set of date change records is written into /etc/wtmp. The wtmpfix program is designed to adjust the time stamps in the wtmp records when a date change is encountered. However, some combinations of date changes and reboots will slip through wtmpfix and cause acctcon1 to fail.
The following steps show how to fix a wtmp file:
cd /var/adm/acct/nite
fwtmp < wtmp.MMDD > xwtmp
ed xwtmp
Delete any corrupted records or delete all records from beginning up
to the date change.
fwtmp -ic <wtmp> wtmp.MMDD
If the wtmp file is beyond repair, remove the file and create an empty wtmp file:
rm /etc/wtmp
touch /etc/wtmp
This prevents any charging of connect time. acctprc1 cannot determine which login owned a particular process, but it will be charged to the login that is first in the password file for that user ID.
If the installation is using the accounting system to charge users for system resources, the integrity of sum/tacct is quite important. Occasionally, mysterious tacct records appear with negative numbers, duplicate user IDs, or a user ID of 65,535. First check sum/tacctprev with prtacct. If it looks all right, the latest sum/tacct.MMDD should be patched up, then sum/tacct recreated. A simple patchup procedure would be:
Give the command:
cd /var/adm/acct/sum
Give the command:
acctmerg -v < tacct.MMDD > xtacct
Give the command:
ed xtacct
Remove the bad records.
Write duplicate UID records to another file.
Give the command:
acctmerg -i < xtacc t > tacct.MMDD
Give the command:
acctmerg tacctprev <tacct.MMDD> tacct
Remember that the monacct procedure removes all the tacct.MMDD files; therefore, you can recreate sum/tacct by merging these files.
The file /usr/lib/acct/holidays contains the prime/nonprime table for the accounting system. The table should be edited to reflect your location's holiday schedule for the year. The format is composed of three types of entries:
Comment Lines, which may appear anywhere in the file as long as the
first character in the line is an asterisk.
Year Designation Line, which should be the first data line (noncomment line) in the file and must appear only once. The line consists of three fields of four digits each (leading white space is ignored). For example, to specify the year as 1992, prime time at 9:00 a.m., and nonprime time at 4:30 p.m., the following entry is appropriate:
1992 0900 1630
A special condition allowed for in the time field is that the time 2400
is automatically converted to 0000.
Company Holidays Lines, which follow the year designation line and have the following general format:
day-of-year Month Day Description of Holiday
The day-of-year field is a number in the range of 1 through 366, indicating the day for the corresponding holiday (leading white space is ignored). The other three fields are actually commentary and are not currently used by other programs.
runacct generates five basic reports upon each invocation. They cover the areas of connect accounting, usage by person on a daily basis, command usage reported by daily and monthly totals, and a report of the last time users were logged in. The following paragraphs describe the reports and the meanings of their tabulated data.
In the first part of the report, the from/to banner should alert the administrator to the period reported on. The times are the time the last accounting report was generated until the time the current accounting report was generated. It is followed by a log of system reboots, shutdowns, power fail recoveries, and any other record dumped into /etc/wtmp by the acctwtmp program. See the acct(1M) reference page for more information.
The second part of the report is a breakdown of line utilization. The TOTAL DURATION field tells how long the system was in multiuser state (able to be accessed through the terminal lines). The columns are:
During real time, /etc/wtmp should be monitored, since this is the file from which connect accounting is geared. If it grows rapidly, execute acctcon1 to see which line is the noisiest. If the interrupting is occurring at a furious rate, general system performance will be affected.
The daily usage report gives a by-user breakdown of system resource utilization. Its data consists of:
These two reports are virtually the same except that the Daily Command Summary reports only on the current accounting period, while the Monthly Total Command Summary tells the story for the start of the fiscal period to the current date. In other words, the monthly report reflects the data accumulated since the last invocation of monacct.
The data included in these reports tells an administrator which commands are used most heavily. Based on those commands' characteristics of system resource utilization, the administrator can decide what to weigh more heavily when system tuning.
These reports are sorted by TOTAL KCOREMIN, which is an arbitrary yardstick but often a good one for calculating ''drain'' on a system.
The files listed here are located in the /var/adm directory:
The following files are located in the /var/adm/acct/nite directory:
The following files are located in the /var/adm/acct/sum directory:
The following files are located in the /var/adm/acct/fiscal directory:
The IRIX accounting system is designed to work as smoothly as possible. However, it is a complex system. If you plan to use the accounting system, it is a good idea to study carefully the reference pages for the various accounting programs. Also, keep accurate records in your system log books of how you set up the accounting system and how you changed it.
|
Copyright © 1997, Silicon Graphics, Inc. All Rights Reserved. Trademark Information