I have a script which is having too many modules. The script is working perfectly, but the only issue is it is taking too much time and I need to reduce the complexity of the script.
I have a source file (it can have multiple rows of data, each column separated by a '|'). The script should check all the rows for all the columns and find the columns which is having incorrect data (e.g., null in a non-null column, space in a not space column, alphabets in a number only column, and any value which is not a valid value for that column).
Now I have a master file, which has those column names and the position of the occurrence of that column in the source file with some indicators which will decide what all we have to perform on that column.
Example of master file:
MBR_SRC_SYS,15,H,N,N,N,1,FAC|NSC|WGS|ACE|CS9|CHP CDH_AMT,27,H,1,1,1,N,N
There are 15 columns in my source file will be MBR_SRC_SYS and 'H' Indicates I have to perform a hard error check. Later is the indicator for the other checks like not null, not space, number format, and valid value and the last contains the valid values this particular column can have.
Format of master file:
Clm Name,Position,HardErrorCheck,NullCheck,SpaceCheck,NumberCheck,ValidValueCheck,ValidValues
Example of source file:
DTL|CLM| |RMB_CLM_2015_V01|RMBFCSNSC|15135NSC|rmb_fcsdmsclm_n.dat|rmb_fcsdmsclm_n.trg|NSC 2015093QA01109920150514 4 3132PD|WFS|2015093QA011099|2015/05/14 04:31:32|131|99|NAP||27002|6MB09| | |2014/10/04 00:00:00|2015/05/15 00:00:00|2015/05/15 00:00:00|2015/05/15 00:00:00|2015/05/15 00:00:00|97.26|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0.00|0|0|WIFE |A|SMITH |603T40775 |270022222 |603T40775 |20 |F|1966/12/02 00:00:00|SUB|MD|SN|G2001|NTINCR|MFFFF| |91|91| |NSC|NAIIH11X|H| | | | | | | | | | | | |2015/05/15 00:00:00| | | | | | | | | | | | | | |W|D|N|P|PF|OP|N|CHK|0001025000| 7021693440|351159676 |Y| |I~IN~ ~ ~ ~ | ~ ~ | | | | | | | | |0.00|0.00|0.00| | | | | | | | | | | | | | | | |
What I am doing is having 2 while
loops. With one loop I am reading the lines of the source file. Now inside this while
loop after reading I am having another while
loop to read the master file and do all the necessary checks.
Suppose my source file is having 500 records with 100 columns. Total loop count will be 500*100 = 50000 iterations. This is too much, because my source file can have as many as 5 million records.
I am new to Unix, so I am not sure, but scripting languages like PHP have a timeout and will definitely not be able to run this long. So what should be the solution for this?
#Log file creation
#exec 1> $CODE/WCC_FOA_RMBHUB/logs/RMB_HARD_ERROR_CHECK_ROWMAJOR_$(date +"%Y%m%d_%H%M%S").log 2>&1
SrcFilePath=$PMDIR/SrcFiles/WCC_FOA_RMBHUB
ListFilePath=$CODE/WCC_FOA_RMBHUB/lists
SrcFileName=$1
HardErrorFile=HARD_ERR_RMB_TEST_FILE.dat
MasterFile=$2
NullCheck()
{
value="$1"
if [ ! -n "$value" ] ; then
return 10
else
return 1
fi
}
SpaceCheck()
{
value="$1"
NullCheck "$value"
t=$?
if [ "$t" != "10" ]; then
value=$(echo $value)
if [ ! -z "$value" -a "$value" != " " ]; then
return 1
else
return 10
fi
else
return 1
fi
}
NumberCheck()
{
value="$1"
if [ $value -eq $value 2>/dev/null ]; then
return 1
else
return 10
fi
}
ValidCheck()
{
value="$1"
# Replace space with 'SPACE'
SpaceCheck "$value"
t1=$?
if [ "$t1" == "10" ]; then
value="SPACE"
fi
ValidFromMapping="$2"
needle="|"
NumberOfDelimeter=$(echo $ValidFromMapping | awk 'BEGIN{FS="|"} {print NF}')
v=1
vf=0
while [ $NumberOfDelimeter -gt 0 ]
do
#echo YES
ValidValue=$(echo "$ValidFromMapping" | cut -d'|' -f$v)
#echo "Checking with the value $ValidValue."
if [ "$ValidValue" == "$value" ]; then
vf=1
fi
v=$(expr $v + 1)
NumberOfDelimeter=$(expr $NumberOfDelimeter - 1)
done
if [ $vf -eq 1 ]; then
#echo "Match Found"
return 1
else
#echo "No Match Found"
return 10
fi
}
SoftErrThreshHold=0
#echo "Errrr... We got a hard error check column. We have to check $ColumnName in the whole god dam source table. This column is available in column $ColNum .We got to do it brah !!!\n"
while read RowOne
do
HardErrorFlag=1
SoftErrorFlag=1
RecordColumn=$(echo "$RowOne" | cut -d'|' -f1)
#echo "Started New Row"
if [ "$RecordColumn" == "DTL" ]; then
KeyColumn=$(echo "$RowOne" | cut -d'|' -f9)
ErrorText="$KeyColumn |HARD ERROR OCCURRED IN THE FOLLOWING COLUMN(S):"
SoftErrorText="$KeyColumn |SOFT ERROR OCCURRED IN THE FOLLOWING COLUMN(S):"
while read line
do
ColumnName=`echo $line | cut -d',' -f1`
ColNum=`echo $line | cut -d',' -f2`
ErrorCheck=`echo $line | cut -d',' -f3`
NullCheckStatus=`echo $line | cut -d',' -f4`
SpaceCheckStatus=`echo $line | cut -d',' -f5`
NumberCheckStatus=`echo $line | cut -d',' -f6`
ValidCheckStatus=`echo $line | cut -d',' -f7`
ValidValues=`echo $line | cut -d',' -f8`
# Hard Error Check -
if [ "$ErrorCheck" == "H" ]; then
#echo "Doing Hard Error"
IsNull=1;
IsNumber=1;
IsSpace=1;
IsValid=1;
HardErrorCheckColumnValue=$(echo "$RowOne" | cut -d'|' -f$ColNum)
#echo "Value is $HardErrorCheckColumnValue."
#Funtions Return 10 if it is NULL/SPACE/NOT A NUMBER
#echo $NullCheckStatus
if [ "$NullCheckStatus" == "1" ]; then
NullCheck "$HardErrorCheckColumnValue"
IsNull=$?
fi
#echo $NumberCheckStatus
if [ "$NumberCheckStatus" == "1" ]; then
NumberCheck "$HardErrorCheckColumnValue"
IsNumber=$?
NullCheck "$HardErrorCheckColumnValue"
IsNull=$?
SpaceCheck "$HardErrorCheckColumnValue"
IsSpace=$?
fi
#echo $SpaceCheckStatus
if [ "$SpaceCheckStatus" == "1" ]; then
SpaceCheck "$HardErrorCheckColumnValue"
IsSpace=$?
fi
#echo $ValidCheckStatus
if [ "$ValidCheckStatus" == "1" ]; then
ValidCheck "$HardErrorCheckColumnValue" "$ValidValues"
IsValid=$?
fi
if [ $IsNull -eq 10 ] || [ $IsNumber -eq 10 ] || [ $IsSpace -eq 10 ] || [ $IsValid -eq 10 ]; then
HardErrorFlag=10
if [ $IsNull -eq 10 ]; then
ErrorText=$ErrorText" $ColumnName is Null or Invalid; "
fi
if [ $IsNumber -eq 10 ]; then
ErrorText=$ErrorText" $ColumnName is not a valid number; "
fi
if [ $IsSpace -eq 10 ]; then
ErrorText=$ErrorText" $ColumnName is Space or Invalid; "
fi
if [ $IsValid -eq 10 ]; then
ErrorText=$ErrorText" $ColumnName is not a valid value; "
fi
fi
fi
# Soft Error Check -
if [ "$ErrorCheck" == "S" ]; then
#echo "Doing Soft Error"
IsSValid=1
IsSNull=1;
IsSNumber=1;
IsSSpace=1;
SoftErrorCheckColumnValue=$(echo "$RowOne" | cut -d'|' -f$ColNum)
#echo "Value is $SoftErrorCheckColumnValue."
#Funtions Return 10 if it is NULL/SPACE/NOT A NUMBER
#echo $NullCheckStatus
if [ "$NullCheckStatus" == "1" ]; then
NullCheck "$SoftErrorCheckColumnValue"
IsSNull=$?
fi
#echo $NumberCheckStatus
if [ "$NumberCheckStatus" == "1" ]; then
NumberCheck "$SoftErrorCheckColumnValue"
IsSNumber=$?
NullCheck "$SoftErrorCheckColumnValue"
IsSNull=$?
SpaceCheck "$SoftErrorCheckColumnValue"
IsSSpace=$?
fi
#echo $SpaceCheckStatus
if [ "$SpaceCheckStatus" == "1" ]; then
SpaceCheck "$SoftErrorCheckColumnValue"
IsSpace=$?
fi
#echo $ValidCheckStatus
if [ "$ValidCheckStatus" == "1" ]; then
ValidCheck "$SoftErrorCheckColumnValue" "$ValidValues"
IsSValid=$?
fi
if [ $IsSNull -eq 10 ] || [ $IsSNumber -eq 10 ] || [ $IsSSpace -eq 10 ] || [ $IsSValid -eq 10 ]; then
SoftErrorFlag=10
if [ $IsSNull -eq 10 ]; then
SoftErrorText=$SoftErrorText" $ColumnName is Null or Invalid; "
fi
if [ $IsSNumber -eq 10 ]; then
SoftErrorText=$SoftErrorText" $ColumnName is not a valid number; "
fi
if [ $IsSSpace -eq 10 ]; then
SoftErrorText=$SoftErrorText" $ColumnName is Space or Invalid; "
fi
if [ $IsSValid -eq 10 ]; then
SoftErrorText=$SoftErrorText" $ColumnName is not a valid value; "
fi
SoftErrThreshHold=$(expr $SoftErrThreshHold + 1)
fi
fi
done <$ListFilePath/$MasterFile
if [ $HardErrorFlag -eq 10 ]; then
echo $ErrorText
fi
if [ $SoftErrorFlag -eq 10 ]; then
echo $SoftErrorText
fi
fi
done <$SrcFilePath/$SrcFileName
if [ $SoftErrThreshHold -eq 0 ]; then
echo "No Soft Error was found in the file"
else
echo "Total Count Of SofErrors Found : $SoftErrThreshHold"
fi