Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

I have the below command.

 unzip -p GLP.K4C.S06F5.BG57218-rdf.zip | grep  ":taxonomies-" | head -1

which gives me the output as,

    <j.2:Taxo_Version rdf:resource="refmat:taxonomies-8.2.0"/>

However, I need to extract only taxonomies-8.2.0 instead of full string as above.

share|improve this question
    
In the future, please provide an example of your input file, it lets us give you more specific answers than just a single line. For example, my solutions below might work for other lines of your file but I can't know since you haven't shown it. –  terdon Oct 21 '14 at 16:28

2 Answers 2

If you know the occurence of : character in your input, you could do something like this.

echo " <j.2:Taxo_Version rdf:resource="refmat:taxonomies-8.2.0"/>" | 
awk -F\: '{print $4}' | sed 's/..$//'

The awk command prints the 4th string after the : delimiter and the sed command is used to remove the last 2 characters to get the desired output.

However, if this method works or not depends on your input as terdon points out in his comments.

EDIT

The final pipe to sed could very well be avoided if we use the solution as suggested by jasonwryan in the comments. So, the command would effectively be rephrased as,

 echo " <j.2:Taxo_Version rdf:resource="refmat:taxonomies-8.2.0"/>" | 
 awk -F: '{sub(/\/>/,""); print $4}'

Another solution just using cut and rev can be framed as,

echo " <j.2:Taxo_Version rdf:resource="refmat:taxonomies-8.2.0"/>" | 
cut -d ':' -f4 | rev | cut -c 3- | rev

Again the specifying of delimiter is dependent on the input file and from the example you have provided the characters that I need to extract occur after the 4th position of the delimiter. I use cut to extract the substring after this 4th delimiter and use good old rev technique to reverse the string and remove the last 3 characters and again apply rev on it to get the actual string.

share|improve this answer
    
...or split fields on two delimiters (-F"[:"]) or use gsub to avoid the pipe to sed. –  jasonwryan Oct 21 '14 at 16:31
    
@jasonwryan, Thanks. But, I am afraid I am not following your suggestion. Could you please let me know how I could improve the command further? –  Ramesh Oct 21 '14 at 16:39
    
@jasonwryan, added another example without using awk or sed. Hope this one is little better. –  Ramesh Oct 21 '14 at 16:46
1  
Either of these will work (and don't require another process): awk -F: '{sub(/\/>/,""); print $4}' or awk -F'[:/]' '{print $4}': let Awk do the lifting... :) –  jasonwryan Oct 21 '14 at 18:02
    
@jasonwryan, thanks a lot. I added your suggestion to the answer. :) –  Ramesh Oct 21 '14 at 18:08

One way is to use grep's -o option, combined with the power of PCREs (-P):

   -o, --only-matching
          Print  only  the  matched  (non-empty) parts of a matching line,
          with each such part on a separate output line.
   -P, --perl-regexp
          Interpret  PATTERN  as  a  Perl  regular  expression  (PCRE, see
          below).  This is highly experimental and grep  -P  may  warn  of
          unimplemented features.

So, you could do

 unzip -p GLP.K4C.S06F5.BG57218-rdf.zip | grep -oP ':\Ktaxonomies-[^"]*' | head -1

The \K causes anything matched up to that point to be ignored (so the : is not printed) and [^"]*" means "match as many non-" characters as possible.

Other options include:

  1. sed

    unzip -p GLP.K4C.S06F5.BG57218-rdf.zip | 
        sed -n 's/.*:\(taxonomies-[^"]*\).*/\1/p' | head -1
    

    The -n causes sed to print nothing unless explicitly told to and the s/// is the substitution operator. It will replace everything on the line with the part of the line between the parentheses (\1). The p causes the resulting line to be printed.

  2. Perl

    unzip -p GLP.K4C.S06F5.BG57218-rdf.zip | 
      perl -lne 's/.*:(taxonomies-[^"]).*/$1/ && print' | head -1
    

    The same basic idea as the sed. If the substitution was successful, the line is printed. An alternative would be

    unzip -p GLP.K4C.S06F5.BG57218-rdf.zip | 
      perl -lne '/.*:(taxonomies-[^"])/ && print $1' | head -1
    
share|improve this answer
    
Still not getting expected output. This will give you correct output [cpc22776 141029_134901]$ unzip -p GLP.K4C.S1BB7.BG49087-rdf.zip | sed -n 's/.*:(taxonomies-[^"].).*/\1/p' | head -1 taxonomies-07 but when it has taxonomy verison it prints . after 8---- cpclb2a670:/usr/local/afs7/PaF/LNK4C/C2B_75/LN_input/zip_processed/0KFQ/FRA-DEV_‌​176YYY-K4C_28Oct/141029_062955_load $ unzip -p GLP.K4C.S0700.BG75448-rdf.zip | sed -n 's/.*:(taxonomies-[^"].).*/\1/p' | head -1 taxonomies-8. –  Atil Thakor Oct 29 '14 at 19:05
    
@AtilThakor yes, that's why I said you need to show your inout file. Please edit your question and add an example of your file. –  terdon Oct 29 '14 at 21:54

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.