Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I want to grep some information from raw combined log format apache logs:

51.254.56.62 - - [01/Jun/2016:20:49:28 +0500] "GET /vendors/jquery.slimscroll.min.js HTTP/1.1" 404 - "http://networkconfig.net/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
51.254.56.62 - - [01/Jun/2016:20:49:28 +0500] "GET /jquery.fullPage.js HTTP/1.1" 304 - "http://networkconfig.net/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
51.254.56.62 - - [01/Jun/2016:20:49:29 +0500] "GET /js/TweenLite.min.js HTTP/1.1" 304 - "http://networkconfig.net/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
51.254.56.62 - - [01/Jun/2016:20:49:29 +0500] "GET /js/EasePack.min.js HTTP/1.1" 304 - "http://networkconfig.com/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
51.254.56.62 - - [01/Jun/2016:20:49:29 +0500] "GET /js/rAF.js HTTP/1.1" 304 - "http://networkconfig.com/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
51.254.56.62 - - [01/Jun/2016:20:49:29 +0500] "GET /js/networkconfig.js HTTP/1.1" 304 - "http://networkconfig.com/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
182.180.10.40 - - [01/Jun/2016:20:49:29 +0500] "GET /js/rAF.js HTTP/1.1" 304 - "http://networkconfig.com/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
182.180.10.40 - - [01/Jun/2016:20:49:29 +0500] "GET /js/networkconfig.js HTTP/1.1" 304 - "http://networkconfig.com/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
182.180.10.40 - - [01/Jun/2016:20:49:28 +0500] "GET /vendors/jquery.slimscroll.min.js HTTP/1.1" 404 - "http://networkconfig.net/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
182.180.10.40 - - [01/Jun/2016:20:49:28 +0500] "GET /jquery.fullPage.js HTTP/1.1" 304 - "http://networkconfig.net/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"

This is what I have done:

  awk '{ print $1,$11}' accesslog | sort | uniq -c | sort -nr | head -n 10

  3 51.254.56.62 "http://networkconfig.net/"
  3 51.254.56.62 "http://networkconfig.com/"
  2 182.180.10.40 "http://networkconfig.net/"
  2 182.180.10.40 "http://networkconfig.com/"

What I want to get is:

Domains                     Hits By IP

networkconfig.net           3 hits 51.254.56.62  | 2 hits 182.180.10.40 and so on
networkconfig.com           3 hits 51.254.56.62 | 2 hits 182.180.10.40 and so on
share|improve this question
    
Please specify the name of the program that made the log file. – agc Jun 2 at 8:42
    
actually these are apache webserver domlogs. – rlinux57 Jun 2 at 17:54

Revised version (3nd) of ugly.sh:

#!/bin/bash
{ echo "Domains  Hits by IP" ; \
  awk '{ print $1 gsub(/^.*:\/\/|\"|\/.*$/,"",$11) "\t" $11 }' $1 | \
      sort | \
      uniq  -c | \
      sort -k3,3 -k1,1nr | \
      while n="" read a b c; do \
          [ $a = 1 ] && p='' || p=s ; \
          if [ "$n" = "$c" ] ; then \
               echo -n "  |  $a hit$p $b" ; \
          else echo ; \
               echo -n   "$c $a hit$p $b" ; \
          fi  ; n="$c" ; \
      done ; \
      echo ; \
} | \
while read a b ; do \
    printf "%-30s   %s\n" "$a" "$b" ; \
done

Output of ./ugly.sh accesslog:

Domains                          Hits by IP

networkconfig.com                3 hits 51.254.56.62  |  2 hits 182.180.10.40
networkconfig.net                3 hits 51.254.56.62  |  2 hits 182.180.10.40

Output of ./ugly.sh log.txt, (OP's URL for data: log.txt):

Domains                          Hits by IP

-                                1 hit 180.76.15.138  |  1 hit 192.243.55.136
www.google.com.pk                3 hits 122.129.73.92
www.networkconfigorchard.com     2 hits 39.46.59.57  |  8 hits 39.46.6.0
share|improve this answer
    
thanks but it won't work with following file: networkconfig.net/test/logs.txt – rlinux57 Jun 1 at 21:17
    
Now fixed to work with 'logs.txt'. also. – agc Jun 2 at 2:48
    
Please visit this link: networkconfig.net/test/text.txt Moreover could you send me script description how it works exactly. – rlinux57 Jun 2 at 8:05
    
That 'text.txt' must have started from some other log file, and the output has things that I'd hoped to have filtered out. Perhaps a script file would do better: ugly.sh, invoke it like this: ugly.sh logs.txt or whatever the file to test is. If the results come out wrong, send a link to the tricky data for another fix. (How it works: awk rips out some fields, which are sorted in what should be the correct order, the while loop puts the IP hits on the same line as the domain, the sed removes unwanted frills, and printf prints it neatly.) – agc Jun 2 at 8:40
    
Also, maybe the OP spec should be altered to include things like "-" for a Domain. – agc Jun 2 at 8:41

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.