Extract Strings from Web Log file

Question

Given a file containing web access logs for a YouTube video, every line is a hit and is in the format.

62.172.72.131 - - [02/Jan/2003:02:06:41 -0700] "GET /random/html/riaa_hacked/ HTTP/1.0" 200 10564 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0; WWP 17 August 2001)"    
63.194.21.74 - - [30/Apr/2003:13:13:22 -0700] "GET /random/video/Star_Wars_Kid_Remix.wmv HTTP/1.1" 206 1146708 "-" "NSPlayer/9.0.0.2980 WMFSDK/9.0"    
161.114.88.73 - - [02/May/2003:03:27:41 -0700] "GET /random/video/Star_Wars_Kid.php HTTP/1.0" 302 1 "http://friends.portalofevil.com/sp.php?si=3&fi=FRIENDSOF&ti=1000489621&pi=1000489621" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; compaq)"    
64.164.63.70 - - [02/May/2003:13:24:19 -0700] "GET /random/video/Star_Wars_Kid.wmv HTTP/1.1" 302 307 "http://blogdex.media.mit.edu/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)"

I need to extract the IP address and Date in the square bracket and put it in a csv file i.e ip_address,date

I'm using the following commands to get the IP and Date respectively

grep -oP '([0-9]{1,3}\.){3}[0-9]{1,3}' test.log
grep -oP "\[\K[^\]]+" test.log

I don't know how to combine it into 1 string to put it in a csv

tr '\n' > file.csv

Since this is a large log file, I thought unix commands will deal with it efficiently. Is there a difference between using unix commands or programming it in python (reading each line, manipulate the string and then write to file)?

Ralph Rönnquist · Answer 1 · 2016-02-28 14:01:09Z

up vote 1 down vote

You'd do better using awk,

awk '{print $1,$4,$5;}' test.log

awk breaks up each line on spacing, letting you refer to the fields as $1, $2,... etc, and then you just print the first, fourth and fifth fields ($4 and $5 makes up the date stamp).

answered Feb 28 at 14:01

Ralph Rönnquist

1,35217

+1, awk is the right tool for this job. but OP wanted two fields, not three. try awk '{print $1, $4 " " $5}' test.log instead. – cas Feb 28 at 23:56

add a comment |

Andrew Miloradovsky · Answer 2 · 2016-02-28 13:59:23Z

up vote 0 down vote

Use RE substitutions in sed, here \1, \2, ... are assigned the values between corresponding $ and $

sed 's/\([0-9\.]\) - - \[\(.*\)\] "GET .*/\1, \2/' test.log

(sure you may substitute a more exact patterns in the parentheses)

answered Feb 28 at 13:59

Andrew Miloradovsky

2388

add a comment |

Thomas · Answer 3 · 2016-02-28 14:04:39Z

With a unix command you could use the following sed

sed -e 's/\(\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}\).*\[\(.*\)\].*/\1\t\3/' test.log

But if the logfile is very big, I think it would be better to use Python as it can handle actions on big files more effectively with e.g. the fileinput library or a generator.

asked	5 months ago
viewed	54 times
active	5 months ago

current community

your communities

more stack exchange communities

Extract Strings from Web Log file

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged sed grep regular-expression logs python or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Extract Strings from Web Log file

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged sed grep regular-expression logs python or ask your own question.

Related

Hot Network Questions