1

I have a txt file which has multiple XML records. I have to separate it using vendorname. I modified the commands as per my requirement. However, it's giving me 2 error messages:

awk: sf.awk:1: /<hdr:vendorname xmlns:hdr=\"http:\//www.mycompany.com\/header\/v1\">[A-Z]+<\/hdr:vendorname>/{
awk: sf.awk:1:                                         ^ syntax error
awk: sf.awk:1: /<hdr:vendorname xmlns:hdr=\"http:\//www.mycompany.com\/header\/v1\">[A-Z]+<\/hdr:vendorname>/{
awk: sf.awk:1:                                                               ^ backslash not last character on line

if I add a \ in front of ., then it gives me another error message

awk: sf.awk:1: /<hdr:vendorname xmlns:hdr=\"http:\//dwh\.www.mycompany.com\/header\/v1\">[A-Z]+<\/hdr:vendorname>/{
awk: sf.awk:1:                                         ^ backslash not last character on line

Script

/<hdr:vendorname xmlns:hdr=\"http:\//www.mycompany.com\/header\/v1\">[A-Z]+<\/hdr:vendorname>/{
    split($0, a, "hdr:vendorname xmlns:hdr=\"http:\//www.mycompany.com\/header\/v1">|<\/hdr:vendorname")
    if (out["file_"a[2]".txt"] == "") {
      out["file_"a[2]".txt"] = $0
    }
    else {
      out["file_"a[2]".txt"]=out["file_"a[2]".txt"]"\n"$0
    }
  }

END {
    for (fic in out) {
      printf out[fic] > fic
    }
  }
2
  • You need to escape both / characters. Fix typos, get code working, then if you like you can post it on codereview.stackexchange.com. Commented Nov 17, 2016 at 23:08
  • Honestly, I wouldn't do it like this. Parsing XML using regular expressions simply doesn't work very well. If you post some sample XML (and desired output) I'll give you an illustration of how it can be done using a parser. Commented Nov 18, 2016 at 15:58

1 Answer 1

0

You need to backslash-escape both slashes in http://, not just the first one.

Thus, the pattern match should look like,

/<hdr:vendorname xmlns:hdr=\"http:\/\/www.mycompany.com\/header\/v1\">[A-Z]+<\/hdr:vendorname>/

Additionally, in your call to split(),

  1. you shouldn't backslash-escape the forward slashes in the string;
  2. you should backslash-escape the double-quote (") after v1.

So the string in split() should look like,

"hdr:vendorname xmlns:hdr=\"http://www.mycompany.com/header/v1\">|</hdr:vendorname"

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.