A commercial, upgraded version of the utility presented here is available for a reasonable cost from the author's company, including an updated interface and support for multi-volume databases. Contact jdj@jdjohnson.com or (214) 733-4958.
This article originally appeared in Progressions, a Progress-only magazine published by White Star Software.
It seems that one of the 'Universal Truths' of the MIS world is that we are always 'playing catch-up'. Under such conditions, things which should be included as part of the development process are often neglected or done in a haphazard fashion. One of the most frequently neglected part of an organization's processing is the purging of outdated data. Purges are usually performed in a reactive, rather than proactive, mode - often scheduled in response to a developing shortage of disk space or user complaints about poor performance. Many times this approach results from the fact that purges require 'manual' intervention by the DBA staff during off-hours. The approach described in the following paragraphs was developed to relieve our DBA staff of much of the burden of maintaining the roughly 20 Gb of data distributed among 60 Production databases on 10 Unix servers by automating the process to the point where our Computer Operators can assume much of the responsibility for its execution.
This process employs the dump/reload approach to take advantage of the resulting compaction of the database, along with performance improvements brought about by the physical reorganization of the data and indexes. Consequently, our purge utility consists of two separate, but related, functional pieces - A dump utility and a reload function. The dump utility can be run as any number of concurrent sessions, each dumping different files, but only one reload process is necessary, The core programs are designed to be run against any database on any system without modification. The only programs in this process that must be written specifically for each system are those which deal with specific record selection criteria.
The dump portion of this procedure, started in a Quality-Assurance database (Figure 1), consists of three preparatory programs (out-st.p, list-bld.p, and fd-bld.p), one control program , a generic dump program, and a series of system-specific purge programs. The out-st.p program (Figure 2) provides any entry point into the dump process. It performs as follows:
1. It determines whether or not it is the first copy being executed by checking the ctrl-ind field of the dump-ctrl record. If another copy is already running, control passes to Step 7.
2. User is prompted to update fields in the dump-ctrl record for the path to the database to be dumped, the destination for the output files, the path to the new database, and the directory to be used for the index-rebuild sort files.
3. Connection is made to the target database.
4. The dump-ctrl.ctrl-ind is set to 'y'.
5. list-bld.p is run, which reads the schema information from the target database and creates or resets the appropriate records in the QA database.
6. fd-bld.p is run, which uses the schema information from the target database to generate the bulkload fd files that will be needed to load the new database.
7. Runs the purge control program - purg-ctrl.p.
The purg-ctrl.p program (Figure 3) determines which target database file should be dumped next by reading the load-qa file of the QA database, finding the first record where the dump indicator = 'no' and the link indicator = 'no'. [NOTE: The link indicator field is set to 'yes' for all files which do not get dumped at all or are dumped in conjunction with other files. This has been predetermined by the DBA Group and is keyed through the use of dump-names beginning with 'x'.] For each file, it determines whether or not a specific purge program exists. If it does, that program is executed; otherwise, the control program executes the generic dump program (Figure 4), passing to it the name of the file to be dumped and the name of the output file to be used.
Upon its successful completion, each dump/purge progam outputs data to a file in a 'done' directory. The name of the output file is equivalent to the name of the bulkload 'fd' file which is needed to load the data into the new database. The file contains 1 record, listing the output record counts for the program and the date and time of program completion. The file serves three purposes: 1) its existence in the 'done' directory is the key used by the load process to signify that the data is available for import into the new database. 2) the filename is used by the load process to identify the appropriate bulkload 'fd' file to use for loading this data. 3) The data contained within this file can be used for Quality Assurance purposes by providing a written record of program execution and output count(s).
The load process is controlled by the execution of the Unix shellscript - bulkload.sh. (Figure 5) This script performs as follows:
A. It tests for the existence of data available for loading by checking the pop-ulation of the 'done' directory. If this directory is unpopulated, the script sleeps for 1 minute before checking again.
B. If the directory is populated, the script processes the entries in the directory in FIFO fashion.
C. For each entry, the script first executes the proutil bulkload option, using the name of the directory entry to determine the bulkload 'fd' file to use.
D. It then executes the PROGRESS bi-bld.p program to cause the generation of a .bi file entry. This has been found to necessary in the Pentium environment because the bulkload utility has a tendency to fail in reading the fd file unless it is slowed down by the need to first truncate the .bi file.
E. After building a .bi file for the new database, the script moves the processed key file from the 'done' directory to a 'history' directory. This ensures that the file will not be processed again while retaining it for its QA purposes.
F. The script appends the key filename to a 'loaded' file and checks the number of files which have been loaded. Once the count of the loaded file equals the total necessary for that particular database, the script terminates.
Once the bulkload.sh script has completed its job, it starts the load-qa.p program (Figure 6), which updates the load-qa file with input counts and performs comparisons between the output and input counts to ensure that all pertinent data has been successfully loaded. In the event of Quality-Assurance errors, the operator is provided with information regarding the source of the errors and can take corrective action. If all checks are passed, load-qa.p will build the appropriate proutil command line for performing the index rebuild and execute it.
The automation of such "mundane" tasks as this can go a long way toward maintaining or improving the reputation of an MIS department by maintaining reasonable systems response, limiting downtime, and providing the developers with more time to address more "important" issues.
Database: qa (PROGRESS) Filename: dump-ctrl Order Field-Name dType Ext Flgs Format Initial ----- -------------------------------- --------- ---- --------------- ---------- 10 ctrl-ind char i x X 20 dbase char x(60) 30 odest char x(50) 40 sdir char x(50) 50 new-db char x(60) Index Name (* indicates primary) Unique Field Name Seq Asc Abbr --------------------------------- ------ -------------------------- --- --- ---- ctrl-idx* yes ctrl-ind 1 yes no Filename: load-qa Order Field-Name dType Ext Flgs Format Initial ----- -------------------------------- --------- ---- --------------- ---------- 10 file-name char i x(20) 20 dump-name char x(12) 30 out-cnt deci0 >>>,>>>,>>9 0 40 in-cnt deci0 >>,>>>,>>9 0 50 dump-ind logi yes/no no 60 dump-prog char x(15) 70 link-ind logi yes/no no Index Name (* indicates primary) Unique Field Name Seq Asc Abbr --------------------------------- ------ -------------------------- --- --- ---- lqa-idx* yes file-name 1 yes no
/* PROGRAM: out-st.p
PURPOSE: To control entry into the automated purge process
and perform the appropriate database connections.
AUTHOR: J. D. Johnson DATE: 08/10/94
MODIFIED:
*/
{ctrl/dmp-var.i "NEW"}
do transaction:
find first dump-ctrl where dump-ctrl.ctrl-ind = 'x' exclusive
no-wait no-error.
if available dump-ctrl then do on error undo, retry:
update skip(1)
"Database to be Purged" skip
dump-ctrl.dbase skip(1)
"Output destination for dump files"
dump-ctrl.odest skip(1)
"Directory for Index-Build Sort" skip
dump-ctrl.sdir skip(1)
"Full Path to New Database" skip
dump-ctrl.new-db
skip(1)
with centered no-labels row 3 title "Purge Startup" frame ost1.
if r-index(dump-ctrl.odest, "/") = length(dump-ctrl.odest) then
assign dump-ctrl.odest =
substring(dump-ctrl.odest, 1, length(dump-ctrl.odest) - 1).
connect value(dump-ctrl.dbase) -RO -ld prod no-error.
if not connected ("prod") then do:
message " Could not connect to requested database. Try again. ".
undo, retry.
end.
assign dump-ctrl.ctrl-ind = 'y'.
status default "Running list-bld.p".
run ctrl/list-bld.p.
status default "Running fd-bld.p".
run ctrl/fd-bld.p.
end.
end.
status default.
find first dump-ctrl no-lock no-error.
if not connected ("prod") then do:
connect value(dump-ctrl.dbase) -RO -ld prod no-error.
if not connected ("prod") then do:
message " Could not connect to requested database. Try again. ".
undo, return.
end.
end.
assign dest-dir = dump-ctrl.odest.
run ctrl/purg-ctrl.p.
/* PROGRAM: ctrl/purge-ctrl.p
PURPOSE: This program finds the first available load-qa record that represents
a file not having been dumped and executes the appropriate dump program.
AUTHOR: J. D. Johnson
DATE: 07/22/94
MODIFIED:
*/
{ctrl/dmp-var.i}
def var prog-name as char format "x(20)"
no-undo.
main-blk:
repeat:
do transaction:
find next load-qa where link-ind = no exclusive no-wait
no-error.
if not available load-qa then do:
if locked load-qa then next main-blk.
else quit.
end.
else do:
if dump-ind then next main-blk.
assign dmp-nme = load-qa.dump-name
load-qa.dump-ind = yes
lqa-recid = recid(load-qa)
lqa-dprog = load-qa.dump-prog
lqa-fname = load-qa.file-name.
release load-qa.
end.
end.
status default "Dumping " + lqa-fname.
assign dmp-cnt = 0.
if lqa-dprog = "" then
run prog/gendump.p value(lqa-fname).
else do:
assign prog-name = "prog/" + lqa-dprog.
run value(prog-name).
end.
do transaction:
find load-qa where recid(load-qa) = lqa-recid exclusive.
assign load-qa.out-cnt = dmp-cnt.
end.
end.
/* PROGRAM: prog/gendump.p
PURPOSE: This program takes runtime parameters from
purge-ctrl.p and dumps the
appropriate file.
AUTHOR: J. D. Johnson
DATE: 07/22/94
MODIFIED:
*/
{ctrl/dmp-var.i}
assign opath = dest-dir + "/" + dmp-nme + ".d".
output to value(opath).
status default "Dumping " + dmp-nme.
for each {1} no-lock:
export {1}.
assign dmp-cnt = dmp-cnt + 1.
if dmp-cnt modulo 100 = 0 then do:
pause 1 no-message.
status default "Dumping " + dmp-nme + ":
" + string(dmp-cnt).
end.
end.
output close.
assign opath = "/usr1/purge/done/" + dmp-nme.
output to value(opath).
export dmp-cnt string(time,"HH:MM:SS") today.
output close.
# PROGRAM: bulkload.sh
# PURPOSE: This script checks the /usr1/purge/done directory for the
# existence of files indicating that the dump of those database
# records has been completed. If 1 or more files exist, it uses
# the name of the oldest file to feed the PROGRESS bulkload
# utility to load the appropriate data into the new database.
# AUTHOR: J. D. Johnson DATE: 07/22/94
# MODIFIED:
#set -x
while true
do
clear
echo "\nInput the path to the data files\n\t\c"
read DPATH
echo "\nInput the name of the database being built:\c"
read NDB
echo "\nInput the path to the new database\n\t\c"
read NPATH
clear
echo "You wish to load data from the " $DPATH" directory into\n\n\t\c"
echo ${NPATH}/${NDB} "\n\n\t\t\t\tIs this correct?(Y/N) [.]\b\b\c"
read ANS
case $ANS in
y|Y)
break
;;
*)
continue
;;
esac
done
echo "\nIs this the first execution of bulkload.sh?:
[.]\b\b\c"
read FE
case $FE in
y|Y)
rm /usr1/purge/ctrl/bload.adv /usr1/purge/ctrl/loaded 1>/dev/null 2>/dev/null
rm /usr1/purge/done/* 1>/dev/null 2>/dev/null
rm /usr1/purge/ldone/* 1>/dev/null 2>/dev/null
echo "\nDo you wish to create the 'Recipient' database?:
[.]\b\b\c"
read CDB
case $CDB in
y|Y)
prodb ${NPATH}/${NDB} /usr2/ccprog/SCHEM/${NDB}
;;
*)
;;
esac
;;
*)
;;
esac
clear
echo "\n\n\n\n\n\n\n\n\n\n\t\t\tBulkload.sh in operation"
PROPATH=/usr1/purge;export PROPATH
ADV=/usr1/purge/ctrl/bload.adv
echo "Bulkload.sh started at" `date` >> $ADV
TOT=`cat /usr1/purge/ctrl/file.cnt`
CNT=0
cd /usr1/purge/done
while [ $CNT -lt $TOT ]
do
DCNT=`ls | wc -l | awk '{print $1}'`
case $DCNT in
0)
sleep 60
;;
*)
FN=`ls -t | tail -1`
echo "Loading " $FN >> $ADV
cd $DPATH
/usr/dlc/proutil ${NPATH}/${NDB} -C bulkload /usr1/purge/fd/${FN}.fd | tee -a $ADV
sleep 5
/usr/dlc/bpro ${NPATH}/${NDB} -p prog/bi-bld.p
echo $FN >> /usr1/purge/ctrl/loaded
cd /usr1/purge/done
mv $FN ../ldone
CNT=`wc -l /usr1/purge/ctrl/loaded | awk '{print $1}'`
sleep 10
;;
esac
done
echo "bulkload.sh completed at" `date` >>
$ADV
/* PROGRAM: load-qa.p
PURPOSE: To capture statistics produced by the automated
purge process and report any errors to the operator.
AUTHOR: J. D. Johnson DATE: 08/09/94
MODIFIED:
*/
def var in-fn as char format "x(20)" extent 5.
def var in-ct as char format "x(20)" extent 5.
def var idb-cmd as char format "x(80)".
def var good-fn as log.
/* Pull in actual load stats from bulkload.sh execution advertisement */
input through /usr1/purge/ctrl/load-sum.sh no-echo.
sum-blk:
repeat:
set in-fn[1] in-fn[2].
if index("1234567890",substring(in-fn[1],1,1))= 0 then do:
find load-qa where load-qa.file-name = in-fn[1] no-error.
if not available load-qa then next sum-blk.
assign good-fn = yes.
end.
else
if good-fn then assign load-qa.in-cnt = dec(in-fn[2])
good-fn = no.
end.
/* Finally, check that all counts match and report those
that don't. */
assign good-fn = yes.
for each load-qa:
if load-qa.in-cnt ne load-qa.out-cnt then do:
display load-qa.file-name
load-qa.dump-name
load-qa.out-cnt
load-qa.in-cnt
with 4 col.
assign good-fn = no.
end.
end.
if good-fn then do:
display skip(1)
" All Counts Match "
skip(1)
with centered row 10.
pause 10.
find first dump-ctrl.
output to /usr1/purge/ctrl/idx-bld.sh.
put unformatted "/usr/dlc/proutil " + dump-ctrl.new-db
+ " -i -TB 31 -TM 32 -T " + dump-ctrl.sdir + "
-C idxbuild all\n".
output close.
unix silent /usr1/purge/ctrl/idx-ctrl.sh&.
quit.
end.
/* PROGRAM: dmp-var.i
PURPOSE: To define variables used in out-st.p, purg-ctrl.p,
and gendump.p
AUTHOR: J. D. Johnson
DATE: 07/22/94
MODIFIED:
*/
def {1} shared var dest-dir as char format "x(60)".
def {1} shared var dmp-nme like load-qa.dump-name.
def {1} shared var lqa-recid as recid.
def {1} shared var lqa-dprog like load-qa.dump-prog.
def {1} shared var lqa-fname like load-qa.file-name.
def {1} shared var dmp-cnt like load-qa.out-cnt.
def var opath as char format "x(50)".
--- John D. Johnson, President Voice: (214) 733-4958 J. D. Johnson & Assoc Email: jdj@jdjohnson.com 5730 Buffridge Tr Dallas, TX 75252