Take the 2-minute tour ×
Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems.. It's 100% free, no registration required.

This one-liner removes duplicate lines from text input without pre-sorting.

For example:

$ cat >f
q
w
e
w
r
$ awk '!a[$0]++' <f
q
w
e
r
$ 

The original code I have found on the internets read:

awk '!_[$0]++'

This was even more perplexing to me as I took _ to have a special meaning in awk, like in Perl, but it turned out to be just a name of an array.

Now, I understand the logic behind the one-liner: each input line is used as a key in a hash array, thus, upon completion, the hash contains unique lines in the order of arrival.

What I would like to learn is how exactly this notation is interpreted by awk. E.g. what the bang sign (!) means and the other elements of this code snippet.

How does it work?

share|improve this question
    
title is misleading, it should be $0 (Zero), not $o (o). –  Archemar 12 hours ago
    
As it's a hash, it's unordered, so "in the order of arrival" isn't actually correct. –  Kevin 3 hours ago

2 Answers 2

up vote 3 down vote accepted

let see

 !a[$0]++

first

 a[$0]

we look at the value of a[$0] (array a with whole input line ($0) as key).

If it does not exist ( ! is negation in test will eval to true)

 !a[$0]

we print the input line $0 (default action)

then, we add one ( ++ ) to a[$0], so next time !a[$0] will evaluate to false.

Nice, find !! you sould have a look at code golf !

share|improve this answer
    
Answer accepted with the minor edits. –  Alexander Shcheblikin 8 hours ago
    
So the essence is this: the expression in the single quotes is used by awk as a test for each input line; every time the test succeeds awk executes the action in curly braces, which, when omitted is {print}. Thanks! –  Alexander Shcheblikin 8 hours ago
1  
@Archemar: This answer is wrong, see mine. –  Gnouc 6 hours ago

Archemar's answer is not correct in order of parsing by awk. Here is the processing:

  • a[$0]: look at the value of key $0, in associative array a. If it does not exist, create it.

  • a[$0]++: increment the value of a[$0], return the old value as value of expression. If a[$0] does not exist, return 0 and increment a[$0] to 1 (++ operator returns numeric value).

  • !a[$0]++: negate the value of expression. If a[$0]++ return 0, the whole expression is evaluated to true, make awk performed default action print $0. Otherwise, the whole expression is evaluated to false, causes awk do nothing.

The misunderstanding here is ++ operator has higher precedence than ! operator, so it has effect before !.

Note:

share|improve this answer
    
looks same to me, it should be read !(a[$0]++)), we return value (0 first time, 1-2-3.. next), then post increment and negate .... –  Archemar 4 hours ago
1  
@Archemar: Your answer indicate that ! is applied before ++. –  Gnouc 4 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.