There are two major reasons why check pointing might be done. The first, is that the computation is sufficiently time consuming (say at least a day) that the user might want to save the state of the computation in case of hardware/system failures. The second is that the user exceeds the maximum number of iterations initially set and the user may desire to continue the computation to convergence without starting over completely. For example, the user initially set NEV=6 and after IPARAM(3) = MXITER only five Ritz values satisfy the convergence requirement specified by TOL. The user would then increase the value of MXITER and resume the computation.
We briefly explain the procedure for check pointing the ARPACK codes using the double precision symmetric code dsaupd as an example. The example shows how to save the state of the computation every 10 iterations. This will require a minor modification to the source codes dsaupd.f and dsaup2.f found in the ARPACK subdirectory SRC (see Chapter 1). These modified codes are already available at the ftp site in the directory pub/software/ARPACK/CONTRIBUTED.
The driver routine is called dssave.
When executed, control first seeks the file arapck_state that is
located in the current working directory. This is
done by the following
c
c %----------------------------------------------%
c | Open the data file that will save the state. |
c %----------------------------------------------%
c
open(12,err= 99,file='arpack_state',status='new')
This statement attempts to open the file arpack_state . Since the open statement contains the status='new' flag, an error is encountered if a file named arpack_state exists and a jump to the statement labeled 99 is taken. The open statement successfully occurs only if there is no file named arpack_state. The statements immediately following the open statement are executed.
The section of code listed in Figure B.3 is executed when the file arpack_state exists. The code reads in a previous state of computation.
c
c %-----------------------------------------%
c | A file state exists, so read it in. |
c %-----------------------------------------%
c
99 open(12,err=199,file='arpack_state',status='old')
print*,'dssave: input existing state'
iounit = 12
read(iounit,8000) ido, bmat, n, which, nev, tol, ncv,
& iparam, ipntr, lworkl, info, np,
& rnorm, nconv, nev2
valfmt = '(3e22.16)'
read(iounit,valfmt) (resid(i), i = 1, n)
ntmp = 3*n
read(iounit,valfmt) (workd(i), i = 1, ntmp)
read(iounit,valfmt) (workl(i), i = 1, lworkl)
do 7002 j=1,ncv
read(iounit,valfmt) (v(i,j), i = 1, n)
7002 continue
The section of code listed in Figure B.4 writes the state of the computation when ido=-2. This occurs when the number of iterations equals 10. This is all accomplished within the modified subroutine dsaup2. The do 100 restrt = 1,mxstrt will allow up to mxstrt writes of the state of the computation to the file arpack_state. Note that before each save of the computation, the file arpack_state is rewound resulting in an overwrite of the contents of arpack_state.
c
c %---------------------------------%
c | Start of the checkpointing loop |
c %---------------------------------%
c
mxstrt = 3
do 100 restrt = 1,mxstrt
c
c %------------------------------------------------%
c | M A I N L O O P (Reverse communication loop) |
c %------------------------------------------------%
c
10 continue
c
c %---------------------------------------------%
c | Repeatedly call the routine DSAUPD and take |
c | actions indicated by parameter IDO until |
c | either convergence is indicated or maxitr |
c | has been exceeded. |
c %---------------------------------------------%
c
call dsaupd ( ido, bmat, n, which, nev, tol, resid,
& ncv, v, ldv, iparam, ipntr, workd, workl,
& lworkl, info, np, rnorm, nconv, nev2 )
c
if (ido .eq. -2) then
c
c %---------------------------------------------------%
c | After maxitr iterations without convergence, |
c | output the computed quantities to the file state. |
c %---------------------------------------------------%
c
rewind(iounit,err=399)
write(iounit,8000) ido, bmat, n, which, nev, tol,
& ncv, iparam,
& ipntr, lworkl, info,
& np, rnorm, nconv, nev2
8000 format(i2,a1,i14,a2,i14,d23.16,16x,/,
& 12i5,12x,/,
& 13i5,7x,/,
& i5,d23.16,i5,i5)
ifmt = 16
len = ifmt + 6
nperli = 3
write(valfmt,8001) nperli,len,ifmt
8001 format(1h(,i1,1he,i2,1h.,i2,1h))
write(iounit,valfmt) (resid(i), i = 1, n)
ntmp = 3*n
write(iounit,valfmt) (workd(i), i = 1, ntmp)
write(iounit,valfmt) (workl(i), i = 1, lworkl)
do 8002 j=1,ncv
write(iounit,valfmt) (v(i,j), i = 1, n)
8002 continue
go to 100
endif