Would switching my cuts below to SED improve performance? I am trying to get a per date count of requests for the last two weeks from a server log. The script runs, but slowly (comes in around 14 minutes). The file is 8,867,820 lines long and around 1.9G. I would guess grep, SED (or AWK) would do this more efficiently but my initial attempts failed and I resorted to cuts.
Is my piping and redirection causing unnecessary delay? Is this simply an issue of a large file?
# Log is in common log format
# host ident authuser date request status bytes
# 127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
#initialize check date and end date
ckdate=$(date --date='1 days ago' +%s)
#enddate=$(date --date='1 fortnight ago' +%s)
enddate=$(date --date='3 days ago' +%s) #test date for shorter term
#feed log in reverse into loop
# tac to print log backwards into loop (newest log entry at bottom of file)
# chkdate is seconds most recent date from line is
tac "/etc/httpd/logs/access_log" | \
while read line && (( $ckdate >= $enddate ))
do
#send line into output, cutting out the date-time (-f4),
# and cleaning for date only (-d: f1)
#echo $line | cut -d ' ' -f4 | tr -d '[' | cut -d: -f1
echo $line | cut -d ' ' -f4 | tr -d '[' \
| cut -d: -f1 | tr '/' '-' | xargs -i date -d '{}' +'%Y-%m-%d'
#update the check date based on latest line, formatting as seconds since 1970
ckdate=$( ( echo $line | cut -d ' ' -f4 | tr -d '[' \
| cut -d: -f1 | tr '/' '-' | xargs -i date -d '{}' +'%s' ) )
done | sort | uniq -c | head -n -1 | head -n 2
#put output into sorted list, with uniq counts, and trim outer bounds