Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. It's 100% free, no registration required.

INPUT:

<tr><td>FOOBAAR</td><td>FOOO</td><td>BAAR</td><td><font style=BACKGROUND-COLOR:red>2014-02-14 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-02-17 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-03-07 13:34</font></td></tr>

OUTPUT:

<tr><td>FOOBAAR</td><td>FOOO</td><td>BAAR</td><td>2014-02-14 13:34</td><td><font style=BACKGROUND-COLOR:red>2014-02-17 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-03-07 13:34</font></td></tr>

Difference: the:

<font style=BACKGROUND-COLOR:red>

and

</font>

was removed only from the fourth column.

My question: How can I remove only given strings from given column?

</td><td>

is the delimiter

share|improve this question

4 Answers 4

up vote 3 down vote accepted

I'd recommend an HTML parsing tool instead of using regular expressions. (Famous answer explaining why here)

Here's an example of using an XML parser (note: requires the input to be well-formed XML, which your sample HTML is not)

# change the value of the style attribute of the font tag of the 4th td tag 
# to the empty string
xmlstarlet ed -O -u '//table/tr/td[4]/font[@style]/@style' -v "" <<END
<html><head></head><body><table>
<tr><td>FOOBAAR</td><td>FOOO</td><td>BAAR</td><td><font style="BACKGROUND-COLOR:red">2014-02-14 13:34</font></td><td><font style="BACKGROUND-COLOR:red">2014-02-17 13:34</font></td><td><font style="BACKGROUND-COLOR:red">2014-03-07 13:34</font></td></tr>
</table></body></html>
END
<html>
  <head/>
  <body>
    <table>
      <tr>
        <td>FOOBAAR</td>
        <td>FOOO</td>
        <td>BAAR</td>
        <td>
          <font style="">2014-02-14 13:34</font>
        </td>
        <td>
          <font style="BACKGROUND-COLOR:red">2014-02-17 13:34</font>
        </td>
        <td>
          <font style="BACKGROUND-COLOR:red">2014-03-07 13:34</font>
        </td>
      </tr>
    </table>
  </body>
</html>
share|improve this answer
    
Thank you, thank you, thank you. I was just about to comment about what you mention in your first paragraph when I scrolled down to your answer. +1 –  0xC0000022L Jun 4 '14 at 2:37

This could work..

#!/bin/sh

# replace specific strings from the fourth column
INSTRING="<tr><td>FOOBAAR</td><td>FOOO</td><td>BAAR</td><td><font style=BACKGROUND-COLOR:red>2014-02-14 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-02-17 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-03-07 13:34</font></td></tr>"

DEL_STRING1="<font style=BACKGROUND-COLOR:red>"
DEL_STRING2="</font>"
DELIM="</td><td>"
OUT_FIRST=`echo $INSTRING | awk -F $DELIM '{print $1,$2,$3,$4}' OFS="</td><td>"`
OUT_FIRST=`echo $OUT_FIRST | awk -F "$DEL_STRING1" '{print $1,$2}' OFS=""`
OUT_FIRST=`echo $OUT_FIRST | awk -F "$DEL_STRING2" '{print $1}'`
OUT_LAST=`echo $INSTRING | awk -F $DELIM '{print substr($0, index($0,$5))}' OFS=$DELIM`
echo "$OUT_FIRST$DELIM$OUT_LAST"

Hope this helps..

share|improve this answer

Awk one-liner command,

$ awk -F '<\/td><td>' 'BEGIN{OFS=FS;} {gsub (/<font style=BACKGROUND-COLOR:red>/,"",$4); gsub (/<\/font>/,"",$4);}1' file 2>/dev/null
<tr><td>FOOBAAR</td><td>FOOO</td><td>BAAR</td><td>2014-02-14 13:34</td><td><font style=BACKGROUND-COLOR:red>2014-02-17 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-03-07 13:34</font></td></tr>
share|improve this answer
sed 's|</td><td>|</td>\nTGT_LINE_MARKER<td>|4' |
sed '\|TGT_LINE_MARKER|{function applied to target field}'
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.