Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

I have a file in which each line is like this

"372"^""^"2015-09-03 06:59:44.475"^"NEW"^"N/A"^""^0^"105592"^"https://example-url.com"^"example-domain < MEN'S ULTRA < UltraSeriesViewAll (18)"^"New"^"MERCHANT_PROVIDED"

I want to extract the urls in the file -- https://example-url.com

I tried these regex using sed command -- sed -n '/"^"http/,/"^"/p'

But it didn't solve my problem.

Pardon me for this problem as I am beginner in sed and regex.

share|improve this question

3 Answers 3

up vote 0 down vote accepted

You could use this

sed -n 's!^.*\^"\(http[^^]*\)"^.*!\1!p'

The potential gotcha for a beginner to REs is that ^ is an indicator for start of line, so you have to ensure you escape it \^ if you want a literal up arrow at the start of your RE.

The RE pattern match can be explained as follows

  • ^.*\^" -- Match from start of line until we see the last possible up-arrow double-quote ^" that satisfies the rest of the pattern
  • \( -- Start a substitution block that can be substituted as \1
  • http[^^]* -- Match http followed by as many characters that are not ^ as possible
  • \) -- End the substitution block "^.* -- Match double-quote and up-arrow "^, then as much as possible (until end of line)

This entire match is replaced by \1, which is the pattern block starting http

share|improve this answer
    
Thanks for the explanation ..... –  Anurag Sharma 4 hours ago

Try this:

echo "372"^""^"2015-09-03 06:59:44.475"^"NEW"^"N/A"^""^0^"105592"^"https://example-url.com"^"example-domain < MEN'S ULTRA < UltraSeriesViewAll (18)"^"New"^"MERCHANT_PROVIDED" | cut -f9 -d^
share|improve this answer
    
I think you meant to wrap your echo command in single quotes '. Otherwise you lose the double quotes when you echo. –  Dave yesterday
    
Given statement is also equipped with single quote '. Hence I avoided that. –  SHW yesterday
    
But without the single quote fields like f 10 become arrays. It doesn't impact the url itself really, but it's easy enough to echo the result of this to get rid of the double quotes if needed. –  Dave yesterday
    
The cut is useful but doesn't remove the double quotes surrounding the extracted field. tr -d '"' maybe? –  roaima yesterday

If your version of grep supports PCRE mode, you could try

grep -Po '(?<="\^")http.+?(?="\^")'
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.