Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Given a file containing web access logs for a YouTube video, every line is a hit and is in the format.

62.172.72.131 - - [02/Jan/2003:02:06:41 -0700] "GET /random/html/riaa_hacked/ HTTP/1.0" 200 10564 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0; WWP 17 August 2001)"    
63.194.21.74 - - [30/Apr/2003:13:13:22 -0700] "GET /random/video/Star_Wars_Kid_Remix.wmv HTTP/1.1" 206 1146708 "-" "NSPlayer/9.0.0.2980 WMFSDK/9.0"    
161.114.88.73 - - [02/May/2003:03:27:41 -0700] "GET /random/video/Star_Wars_Kid.php HTTP/1.0" 302 1 "http://friends.portalofevil.com/sp.php?si=3&fi=FRIENDSOF&ti=1000489621&pi=1000489621" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; compaq)"    
64.164.63.70 - - [02/May/2003:13:24:19 -0700] "GET /random/video/Star_Wars_Kid.wmv HTTP/1.1" 302 307 "http://blogdex.media.mit.edu/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)"

I need to extract the IP address and Date in the square bracket and put it in a csv file i.e ip_address,date

I'm using the following commands to get the IP and Date respectively

grep -oP '([0-9]{1,3}\.){3}[0-9]{1,3}' test.log
grep -oP "\[\K[^\]]+" test.log

I don't know how to combine it into 1 string to put it in a csv

tr '\n' > file.csv

Since this is a large log file, I thought unix commands will deal with it efficiently. Is there a difference between using unix commands or programming it in python (reading each line, manipulate the string and then write to file)?

share|improve this question

You'd do better using awk,

awk '{print $1,$4,$5;}' test.log

awk breaks up each line on spacing, letting you refer to the fields as $1, $2,... etc, and then you just print the first, fourth and fifth fields ($4 and $5 makes up the date stamp).

share|improve this answer
    
+1, awk is the right tool for this job. but OP wanted two fields, not three. try awk '{print $1, $4 " " $5}' test.log instead. – cas Feb 28 at 23:56

Use RE substitutions in sed, here \1, \2, ... are assigned the values between corresponding \( and \)

sed 's/\([0-9\.]\) - - \[\(.*\)\] "GET .*/\1, \2/' test.log

(sure you may substitute a more exact patterns in the parentheses)

share|improve this answer

With a unix command you could use the following sed

sed -e 's/\(\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}\).*\[\(.*\)\].*/\1\t\3/' test.log

But if the logfile is very big, I think it would be better to use Python as it can handle actions on big files more effectively with e.g. the fileinput library or a generator.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.