Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

This question already has an answer here:

Many a times when manually grepping through a file, there are so many comments that your eyes glaze over and you start to wish there was a way in which you just could show it to display only those lines which have no comments.

Is there a way to skip comments with cat or another tool? I am guessing there is a way and it involves a regular-expression. I want it just to display and not actually remove any lines or such.

Comments are in the form of # and I'm using zsh as my xterm.

share|improve this question

marked as duplicate by muru, Archemar, don_crissti, Kusalananda, heemayl Jan 17 at 11:13

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

2  
what is the comment format? #? /* ... */? // ? – Jeff Schaller Jan 17 at 1:41
4  
cat doesn't grep, just FYI – Jeff Schaller Jan 17 at 1:42
    
just updated with the info. , any idea why somebody put the close flag ? – shirish Jan 17 at 7:03
    
    
@don_crissti it is similar while the answers given is different than here. – shirish Jan 17 at 10:55
up vote 8 down vote accepted

Well, that depends on what you mean by comments. If just lines without a # then a simple:

grep -v '#'

might suffice (but this will call lines like echo '#' a comment). If comment lines are lines starting with #, then you might need:

grep -v '^#'

And if comment lines are lines starting with # after some optional whitespace, then you could use:

grep -v '^ *#'

And if the comment format is something else altogether, this answer will not help you.

share|improve this answer
2  
grep -v '#' won't suffice because it will ignore lines like echo "#" . grep -v '^#' is also not enough , because comments can appear at any point , such as echo "hello world" # this is a comment – Serg Jan 17 at 2:02
    
@Serg, 100% agree just up-voted your comment. The above may get the OP going forward though. – Stephen Rauch Jan 17 at 2:05
1  
Just try grep -v '^ *#' to only match lines whose first non-space character is #`. – terdon Jan 17 at 10:48

Just grepping will never be able to remove all comments (or comments only) because grep does not understand the language that it is going through. To understand what is a comment and what isn't one you need a lexer that understand that particular language.

There are several answers on SO about how to remove all comments from specific programming languages. I'll add two examples here.

For C the answer by Josh Lee argues:

gcc -fpreprocessed -dD -E test.c

Which runs the preprocessor but keeps the macros.

For python the answer by unutbu (with a small adaptation by myself) writes a small lexer using tokenize:

import tokenize
import io
import sys

def nocomment(s):
    result = []
    g = tokenize.generate_tokens(io.BytesIO(s).readline)  
    for toknum, tokval, _, _, _  in g:
        # print(toknum,tokval)
        if toknum != tokenize.COMMENT:
            result.append((toknum, tokval))
    return tokenize.untokenize(result)

print(nocomment(sys.stdin.read()))

You can then write one of these for each programming language and use a case. Assuming that the python lexer is called remove-comments.py

#!/bin/sh
case "$1" in
  *.py)
    remove-comments.py < "$1"
    break
    ;;
  *.c|*.C|*.cc)
    gcc -fpreprocessed -dD -E "$1"
    break
    ;;
  *)
    echo I do not know how to remove comments from $1, sorry
    break
    ;;
esac

Give a name to the script and add the lexers for the languages you need/use. This should be a more-or-less robust design for comment removal from different file types. (Using file instead of a case on filenames would be more robust too).

share|improve this answer

In case of bash scripts, it is possible via set -vn command. -v tells bash to enter verbose mode, where commands read will also be printed out. -n tells bash to only read script file without executing anything.

Example:

$ cat ./testscript.sh                                                                                                    
#!/bin/bash

# comment
set -vn
echo "Hello World" # another comment
$ ./testscript.sh                                                                                                        
echo "Hello World" # another comment

As you can see, it ignores lines that start with #, but the in-line comments are still printed out. This is of course not ideal, but at least doesn't require any external tools such as grep. I'm not aware of such features in other scripting languages

share|improve this answer
    
Is there something similar in zsh as well ? – shirish Jan 17 at 7:02
    
@shirish not a zsh user, so I don't know :) – Serg Jan 17 at 7:10

As mentioned in comments above, what format 'comments' take in your use-case makes a difference. Still, for several cases, this may be enough, without having to create a script.

The solution:

Reading the question suggests you're using grep already to search the files anyway, so pipe that through anothergrep; like this:

grep your_pattern your_file | grep --perl-regexp --invert-match '(?:^;)|(?:^\s*/\*.*\*/)|(?:^\s*#|//|\*)'

What is not trapped:

This will still allow lines or that have a 'trigger' character elsewhere in the line, that have comments at the end, as in echo "Hello World" # another comment, or that are part of a multi-line comment (except as noted in the explanation below.

If this is used as a post-filter to your grep these limitations should be negligible as most of the comments will still be filtered out and you won't worry "that your eyes glaze over" anymore.

The Explanation:

There are three patterns, which you can modify to suit your use-case if needed. The first (?:^;) catches lines beginning with the ; character. Must be first, without white space. The second catches lines that begin with the `/* ... */` comment style, with or without leading white space. The third catches lines, with or without leading white space, that begin with #, //, or *. The * in the last pattern helps to catch the line inside a multi-line comment in the /* ... */ style where common style is to run a column of * to connect the first and last line together. For example:

/************
 *
 * This is my
 * multi-line
 * comment.
 *
 ************/

The (? ... ) notation around each pattern makes them 'non-capturing' patterns, hopefully to increase speed and reduce resource consumption. The -Pv arguments to grep tell it to use Perl regular expression rules --perl-regexp which allows the non-capturing grouping and allows the | alternation operator to work, neither of which work in CLI grep. The grep man page does warn that the -P option is experimental, so do test before relying on it in your system. The --invert-match tells grep to reverse the match, returning lines that fail the pattern. These can be combined, and shortened to -vP instead.

The reason to use this as a post-filter to your normal grep is three-fold. First, you can do your normal grepping, and only add the extra work of using this when you run into your problem of too many comments in the output. (Less typing and fewer resources used.) Second, you have probably already developed the patterns you commonly use, and the habits that go with them, and adding more complexity to them could break them. Adding more work to debug patterns when you don't have to is wasted work. Third, It doesn't do well with multi-line comments at all, but if you've already grepped the file for what you want, then it'll remove most, if not all, comment from the results, and serve your purpose.

share|improve this answer
1  
@StephenRauch If the commenting # is preceded by white space it is still a comment, most of the time, but not stripped by ^#, need to allow for white space with ^\s#. – Gypsy Spellweaver Jan 17 at 10:12
grep -v "^#" your_file | grep -v "^$" | less

Remove the lines starts with "#" and also remove the empty lines, than send the result to less for a better display.

share|improve this answer

To do this for bash (or bourne shell files) : you can take advantage of bash's "declare -f functionname", which displays functionname with both proper indentation AND with comments removed (so you'd get your comments removed, and as a bonus the indentation would be good too) :

BEAUTIFIER () {
  for f in "$@"; do
    printf "%s" "
      F_from_sh () {
        $(cat "$f")
      }
      echo ___ beautified version of $f : _________________
      declare -f F_from_sh | awk ' (NR > 2) && length>2' | sed -e 's/^  //'
    " | bash
  done
}

Then use as:

BEAUTIFIER script1.sh  script2.bash  etc

Please note : that it will get rid of all comments of the script, even the "shebang" first line ! You may want to also display the first line of $f.

share|improve this answer

Here is a simple process to remove comments, ie everything comes after '#' using sed and awk.

[root@master]# cat hash
This is a program to remove comments from this file
#!/bin/bash
# comment
set -vn # comment
echo "Hello World" # another comment
echo "testscript for removing comments"
echo "Hello World" # another comment
echo 'This is a # sign' #comment
echo "This is a # sign" #comment

[root@master]# awk -F '#' 'BEGIN{OFS="#";} { if (!/#/) ;else $NF="";print $0}' hash | sed -n 's/#$//g;p'
This is a program to remove comments from this file
set -vn
echo "Hello World"
echo "testscript for removing comments"
echo "Hello World"
echo 'This is a # sign'
echo "This is a # sign"
share|improve this answer
1  
What if there's echo "This is a # sign" in the script ? – Serg Jan 17 at 7:40
    
Its not a script, I just show it as by editing a file. You need to use only this much. cat <file> | sed -n 's/#.*$//g;p' – Aljo Antony Jan 17 at 7:43
    
you misunderstood my question. What your sed command does is strip # and anything after it till the end of line. If there is a legitimate command , such as echo "Hello # world" , it will chop off portion of the command, thus introducing bugs if user wants to copy uncommented version of the script somewhere else. See this : paste.ubuntu.com/23815189 – Serg Jan 17 at 8:08
    
In other words, this approach will work, but only if there's no # sign within commands themselves – Serg Jan 17 at 8:10
    
I updated my query, I use zsh. Also you need to make format your code a bit please, as right now its a bit to parse. I have edited a bit so that's its easier to parse now. – shirish Jan 17 at 10:33

Not the answer you're looking for? Browse other questions tagged or ask your own question.