next up previous contents index
Next: The XYaupd ARPACK Routines Up: Tracking the progress of Previous: Obtaining Trace Output

Check Pointing ARPACK

    There are several situations where it would be desirable to have a mechanism to recover from an unexpected interruption in a computation. One way to accomplish this is to save the state of the computation every so often at regular intervals or check points. In case of an interruption, the computation may be resumed from the last point before the fault occurred. This section explains how to implement this check pointing with the ARPACK codes. Familiarity with the ARPACK codes and with the reverse communication protocol is assumed.

There are two major reasons why check pointing might be done. The first, is that the computation is sufficiently time consuming (say at least a day) that the user might want to save the state of the computation in case of hardware/system failures. The second is that the user exceeds the maximum number of iterations initially set and the user may desire to continue the computation to convergence without starting over completely. For example, the user initially set NEV=6 and after IPARAM(3) = MXITER only five Ritz values satisfy the convergence requirement specified by TOL. The user would then increase the value of MXITER and resume the computation.

We briefly explain the procedure for check pointing the ARPACK codes using the double precision symmetric code dsaupd as an example. The example shows how to save the state of the computation every 10 iterations. This will require a minor modification to the source codes dsaupd.f and dsaup2.f found in the ARPACK subdirectory SRC (see Chapter 1). These modified codes are already available at the ftp site in the directory pub/software/ARPACK/CONTRIBUTED.

The driver routine is called dssave. When executed, control first seeks the file arapck_state that is located in the current working directory. This is done by the following

c
c     %----------------------------------------------%
c     | Open the data file that will save the state. |
c     %----------------------------------------------%
c
      open(12,err= 99,file='arpack_state',status='new')

This statement attempts to open the file arpack_state . Since the open statement contains the status='new' flag, an error is encountered if a file named arpack_state exists and a jump to the statement labeled 99 is taken. The open statement successfully occurs only if there is no file named arpack_state. The statements immediately following the open statement are executed.

The section of code listed in Figure B.3 is executed when the file arpack_state exists. The code reads in a previous state of computation.


  
c
c     %-----------------------------------------%
c     | A file state exists, so read it in.     |
c     %-----------------------------------------%
c
 99   open(12,err=199,file='arpack_state',status='old')
      print*,'dssave: input existing state'
      iounit = 12
      read(iounit,8000) ido, bmat, n, which, nev, tol, ncv,
     &                  iparam, ipntr, lworkl, info, np,
     &                  rnorm, nconv, nev2
      valfmt = '(3e22.16)'
      read(iounit,valfmt) (resid(i), i = 1, n)
      ntmp = 3*n
      read(iounit,valfmt) (workd(i), i = 1, ntmp)
      read(iounit,valfmt) (workl(i), i = 1, lworkl)
     do 7002 j=1,ncv
         read(iounit,valfmt) (v(i,j), i = 1, n)
 7002 continue
Figure B.3: Reading in a previous state with the example program dssave.

The section of code listed in Figure B.4 writes the state of the computation when ido=-2. This occurs when the number of iterations equals 10. This is all accomplished within the modified subroutine dsaup2. The do 100 restrt = 1,mxstrt will allow up to mxstrt writes of the state of the computation to the file arpack_state. Note that before each save of the computation, the file arpack_state is rewound resulting in an overwrite of the contents of arpack_state.


  
c
c     %---------------------------------%
c     | Start of the checkpointing loop |
c     %---------------------------------%
c
      mxstrt = 3
      do 100 restrt = 1,mxstrt
c
c        %------------------------------------------------%
c        | M A I N   L O O P (Reverse communication loop) |
c        %------------------------------------------------%
c
 10      continue
c
c        %---------------------------------------------%
c        | Repeatedly call the routine DSAUPD and take |
c        | actions indicated by parameter IDO until    |
c        | either convergence is indicated or maxitr   |
c        | has been exceeded.                          |
c        %---------------------------------------------%
c
         call dsaupd ( ido, bmat, n, which, nev, tol, resid,
     &        ncv, v, ldv, iparam, ipntr, workd, workl,
     &        lworkl, info, np, rnorm, nconv, nev2 )
c    
Figure B.4: Writing a state with the example program dssave.


  
         if (ido .eq. -2) then
c
c        %---------------------------------------------------%
c        | After maxitr iterations without convergence,      |
c        | output the computed quantities to the file state. |
c        %---------------------------------------------------%
c
            rewind(iounit,err=399)
            write(iounit,8000) ido, bmat, n, which, nev, tol,
     &           ncv, iparam,
     &           ipntr, lworkl, info,
     &           np, rnorm, nconv, nev2
 8000       format(i2,a1,i14,a2,i14,d23.16,16x,/,
     &           12i5,12x,/,
     &           13i5,7x,/,
     &           i5,d23.16,i5,i5)
            ifmt = 16
            len  = ifmt + 6
            nperli = 3
            write(valfmt,8001) nperli,len,ifmt
 8001       format(1h(,i1,1he,i2,1h.,i2,1h))
            write(iounit,valfmt) (resid(i), i = 1, n)
            ntmp = 3*n
            write(iounit,valfmt) (workd(i), i = 1, ntmp)
            write(iounit,valfmt) (workl(i), i = 1, lworkl)
            do 8002 j=1,ncv
               write(iounit,valfmt) (v(i,j), i = 1, n)
 8002       continue
            go to 100
         endif
Figure B.5: Writing a state with the example program dssave contd.


next up previous contents index
Next: The XYaupd ARPACK Routines Up: Tracking the progress of Previous: Obtaining Trace Output
Chao Yang
11/7/1997