Bash script to extract HTML comment into a Markdown file

Question

I learned Bash a million years ago. I just wrote this simple script used to get the first lot of HTML comments from a file, and spit it out in order to create a README.md file.

It is just. So. Ugly. I read bits and pieces over the years, and I am sure it can be improved so much...

Here we go:

#!/bin/bash

IFS=''
active='0';
cat  hot-form-validator.html | while read "line";do

  echo $line | grep '\-\->' > /dev/null
  if [ $active = '1' -a $? = '0' ];then 
    exit 0;
  fi;

  suppress=0;
  echo $line | grep '^ *@' > /dev/null
  if [ $? = '0' ];then suppress='1'; fi;


  if [ $active = '1' -a $suppress = '0' ];then echo $line;fi;

  echo $line | grep "<!--" > /dev/null
  if [ $? = '0' ];then active='1'; fi;

done

Questions:

Is there a better way to do grep and then check $?? Back in the day it was the way to go, but...
Should active be a proper number rather than a string with a number? I know, it could be anything... but having a string that can be 0 or 1 just feels wrong.
Is there a better way to preserve spaces, rather than zapping IFS?
Any more pearls of wisdom, other than quitting my (short lived) career of bash scripter?

Tolani Jaiye-Tikolo · Answer 1 · 2016-07-18 18:10:58Z

A bug

If your html contains a backspace character e.g

<p>The special character \n is a way to include read line</p>

your code interprets it a

The special character n is a way to include read line.

to avoid this, add "-r" switch to your read line

cat hot-form-validator.html | while read -r "line";do

You want to check if your file exists before reading it

if [ -f $filename ]; then
#do something
else
#do something
fi

200_success · Answer 2 · 2016-07-18 19:01:32Z

up vote 1 down vote

A Bash loop is not the best approach. If you are doing a lot of line-by-line processing, and invoking grep a lot, then awk would be a more appropriate tool.

awk '/<!--/ { ACTIVE = 1; next }
     /-->/  { exit }
     ACTIVE { print }' < hot-form-validator.html

But line-by-line processing is not an appropriate way to parse HTML. xsltproc, for example, could do a proper job of extracting the first comment in an HTML file:

xsltproc --html first-comment.xsl hot-form-validator.html

… where first-comment.xml contains:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text"/>
  <xsl:template match="/"><xsl:value-of select="//comment()[1]"/></xsl:template>
</xsl:stylesheet>

answered yesterday

200_success♦

97.5k11114333

I don't really want turn xsltproc a requirement to generate the documentation -- especially since I control the contents of the starting file. But, I am SOLD on AWK! What's the best resource to learn it? – Merc yesterday

But also... what about the script in the original post? What if I really wanted to improve it? – Merc yesterday

This tutorial seems good. Or, if you are a masochist, you could try to figure it all out from the man page. – 200_success♦ yesterday

add a comment |

asked	yesterday
viewed	48 times
active	yesterday

current community

your communities

more stack exchange communities

Bash script to extract HTML comment into a Markdown file

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged html parsing bash markdown or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Bash script to extract HTML comment into a Markdown file

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged html parsing bash markdown or ask your own question.

Related

Hot Network Questions