Managing Multiple Shell and Sql Job Runs

01 Sep 2020

# linux # shell # sql

Managing multiple shell and sql job runs

DBAs and application admins face a common problem. They start with couple of cron jobs scheduled, but overtime, we keep adding maintenance jobs, notification jobs, cleanup jobs and the list keeps growing. To maintain logical separation, many directories specific for the jobs were added and for each SQL file there is a corresponding SHELL script in crontab which takes care logging and notification.

When one of the common script changes, most likely all the dependent scripts were to be changed as well. I was managing around 10 servers and each of them had 20+ such jobs.

It’s time to simplify the process. Few of the requirements:

The script should check if the job is already running - if not, start it.
Save the output to its own log file. Keep an archive and a copy of the old file when it hits a specified size.
Run the job. If its a SQL file, use SQLPLUS to kick it off.
Scan the output from the job. If errors - specifically “ORA-” errors, trigger an email irrespective of the job mail settings.

The SQLPLUS script connects to database using “/” (OS authentication) and connects again as required user within the script.

SQL run script - runsql.sh


# This script is called from run.sh which manages the requirements listed above.
# set Oracle environment variables

. ~user/oraenv

# script name - first argument
name=$1
shift

# Run sqlplus - script name and rest of arguments.
sqlplus -s / @$name $*

SHELL script - run.sh

Main script. Call this with the program to run. Like run.sh /home/user1/monthly.sh or run.sh /home/user1/table_maintain.sql


#!/usr/bin/ksh
# #############################################################################
# Title: run.sh
# Description: Script that runs other programs, logs messages and send mails
#
# #############################################################################


# #############################################################################
# Name: arch_file
# Purpose: Keep history of log data. If size exceeds MAX_ARCH_SIZE, then
#          make one backup. Useful as not all logs are sent in mail.
#
# #############################################################################

function arch_file
{
   MAX_ARCH_SIZE=10000000
   arch_curr=${1}.arch
   arch_prev=${1}.prev

   if [[ -r $arch_curr ]]
   then
      size_arch_curr=`ls -l $arch_curr|awk '{print $5}'`
      if [[ $size_arch_curr -gt $MAX_ARCH_SIZE ]]
      then
         mv $arch_curr $arch_prev
      fi
   fi

   echo "=============================================================\n" >> $arch_curr
cat $log >> $arch_curr
}
# End function arch_file

# #############################################################################
# Name: check_run_status
# Purpose: Check if the program is already running.
#          If YES, return 1, else 0.
# #############################################################################

function check_run_status
{

   if [[ "`ps -ef | egrep -v \"grep|vi|more|less|run.sh\" | egrep -c $1`" -gt 0 ]]
   then
      echo  "Program $1 is already running. EXIT"
      OVERRIDE_FILE=/opt/apps/bin/run.override
      if [[ -f $OVERRIDE_FILE ]]
      then
        lcnt=`grep -c $1 $OVERRIDE_FILE`
        if [[ $lcnt -gt 0 ]]
        then
           echo "Program $1 found in Override file $OVERRIDE_FILE...CONTINUE"
           return 0
        fi
      fi
      return 1
   else
   #      echo  "Program $1 not running. Continue."
      return 0
   fi
}

# End function check_run_status

. ~user1/oraenv
BINDIR=/opt/apps/bin

echo "$0 $* on `date`" >> /tmp/run.log
if [ $# -eq 0 ]
then
  echo "ERROR: USAGE `basename $0` script"
  exit 1
fi

export TZ=MST7MDT

RUN_USER="$LOGNAME@`uname -n`"

cd `dirname $1`

name=`basename $1`
if [ ! -r "$name" ]
then
echo "ERROR: `basename`: Could not find $name in `pwd`"
exit 1
fi

shift

# Get file name and extension
l_filename=`echo "$name" |tr '[A-Z]' '[a-z]'`
l_ext=`echo "$l_filename" | sed 's/^.*\.//'`

# Create the log file with same file name as script and .diag extension
log="/opt/apps/out/${l_filename%.*}.diag"


mv $log $log.old 2>/dev/null
{
   echo $RUN_USER running $name at `date +%Y%m%d_%H%M%S`

#  Check if program is already running.

   check_run_status $name
   ostats=$?

   if [[ $ostats -ne 0 ]]
   then
      echo " $name is already running. Exiting"
   else
   # run the program
      if [[ $l_ext = "sql" ]]
      then
         ksh $BINDIR/runsql.sh $name $*
      else
         #echo "Running a Shell script"
         ksh  $name $*
      fi

   fi

   echo $RUN_USER ended $name at `date +%Y%m%d_%H%M%S`
} > $log 2>&1
chmod 666 $log 2>/dev/null

arch_file $log

mail_file=/opt/apps/bin/mail.tab
new_subject=""
errors=0

sub_header="`hostname`:$ORACLE_SID:  $name"

# Email ids - if jobs not maintained in mail.tab, then use this as default.
GLOBAL_PASS_MAIL_TO="user@email.com"
GLOBAL_FAIL_MAIL_TO="user@email.com"

# Add conditions or use a list of error strings
# if [ `grep -c ORA- $log` -gt 0 ] || [ `grep -ci ERROR $log` -gt 0 ]
if [ `grep -c ORA- $log` -gt 0 ]
then
   errors=1
   echo "ERRORS Encountered. Mailing USERS"
   MAIL_SUBJECT="ERROR:$sub_header run failed"
else
   MAIL_SUBJECT="SUCCESS:$sub_header successfully ran "
fi

# Mail file fields
# 1 - name of the script
# 2 - Mail option. I - ignore (only errors), A - all mails - success and failures.
# 3 - Email list when job is successful.
# 4 - Email list when job failed.

MAIL_OPTION=`grep ^$name $mail_file | cut -f2 -d"|"`
MAIL_SUCCESS=`grep ^$name $mail_file | cut -f3 -d"|"`
MAIL_FAIL=`grep ^$name $mail_file | cut -f4 -d"|"`

if [[ -z $MAIL_OPTION ]]
then
   MAIL_OPTION="I"
fi

if [[ $errors = 0 && $MAIL_OPTION = "I" ]]
then
   echo "No errors and option is set to <$MAIL_OPTION>. Exiting"
   exit 0
fi

if [[ -z $MAIL_SUCCESS ]]
then

   MAIL_SUCCESS="$GLOBAL_PASS_MAIL_TO"
else
   MAIL_SUCCESS="$GLOBAL_PASS_MAIL_TO,$MAIL_SUCCESS"

fi
if [[ -z $MAIL_FAIL ]]
then

   MAIL_FAIL="$GLOBAL_FAIL_MAIL_TO"
else
   MAIL_FAIL="$GLOBAL_FAIL_MAIL_TO,$MAIL_FAIL"

fi
#echo "MAIL SUCCESS = $MAIL_SUCCESS -y"
#echo "MAIL FAIL = $MAIL_FAIL -y"

if [[ $errors = 1 ]]
then
   mailx -s "$MAIL_SUBJECT" $MAIL_FAIL < $log
else
   mailx -s "$MAIL_SUBJECT" $MAIL_SUCCESS < $log
fi

exit 0

mail.tab - Mail id file

mail.tab file contains script name, whether to notify users for success/failure or both, email list when successful and failures.


# Field position below - separated by pipe "|"
# Script Name |MAIL_OPTION| MAIL SUCCESS | MAIL FAIL
# First argument should match the script name sent to "run.sh"
# MAIL_OPTION = A, send mail always. "I" - send only when errors. When errors are found, irrespective of this value, mail is sent.
# MAIL_SUCCESS = list of email accounts receiving the mail when program completes successfully.
# MAIL_FAIL = list of email accounts (comma delimited) to receive email when program fails.
# Change "run.sh" GLOBAL_PASS/FAIL_MAIL_TO to include email address for users who wish to receive *ALL EMAILS*.
#
script1.sh|A|user1@email.com, user2@email.com|user1@email.com
sqlscript1.sql|I|user1@email.com|user1@email.com, user2@email.com

Export on-prem Oracle database and import into RDS using datapump

Scheduling jobs with crontab