How does awk '!a[$0]++' work?

Question

This one-liner removes duplicate lines from text input without pre-sorting.

For example:

$ cat >f
q
w
e
w
r
$ awk '!a[$0]++' <f
q
w
e
r
$

The original code I have found on the internets read:

awk '!_[$0]++'

This was even more perplexing to me as I took _ to have a special meaning in awk, like in Perl, but it turned out to be just a name of an array.

Now, I understand the logic behind the one-liner: each input line is used as a key in a hash array, thus, upon completion, the hash contains unique lines in the order of arrival.

What I would like to learn is how exactly this notation is interpreted by awk. E.g. what the bang sign (!) means and the other elements of this code snippet.

How does it work?

As it's a hash, it's unordered, so "in the order of arrival" isn't actually correct. — Kevin, 3 hours ago

Alexander Shcheblikin · Accepted Answer · 2014-10-07 01:36:44Z

up vote 3 down vote accepted

let see

 !a[$0]++

first

 a[$0]

we look at the value of a[$0] (array a with whole input line ($0) as key).

If it does not exist ( ! is negation in test will eval to true)

 !a[$0]

we print the input line $0 (default action)

then, we add one ( ++ ) to a[$0], so next time !a[$0] will evaluate to false.

Nice, find !! you sould have a look at code golf !

edited 8 hours ago

Alexander Shcheblikin
56517

answered 12 hours ago

Archemar
1,491213

Answer accepted with the minor edits. – Alexander Shcheblikin 8 hours ago

So the essence is this: the expression in the single quotes is used by awk as a test for each input line; every time the test succeeds awk executes the action in curly braces, which, when omitted is {print}. Thanks! – Alexander Shcheblikin 8 hours ago

1

@Archemar: This answer is wrong, see mine. – Gnouc 6 hours ago

add a comment |

Gnouc · Answer 2 · 2014-10-07 05:46:45Z

up vote 8 down vote

Archemar's answer is not correct in order of parsing by awk. Here is the processing:

a[$0]: look at the value of key $0, in associative array a. If it does not exist, create it.
a[$0]++: increment the value of a[$0], return the old value as value of expression. If a[$0] does not exist, return 0 and increment a[$0] to 1 (++ operator returns numeric value).
!a[$0]++: negate the value of expression. If a[$0]++ return 0, the whole expression is evaluated to true, make awk performed default action print $0. Otherwise, the whole expression is evaluated to false, causes awk do nothing.

The misunderstanding here is ++ operator has higher precedence than ! operator, so it has effect before !.

Note:

edited 4 hours ago

answered 7 hours ago

Gnouc
25.7k22563

looks same to me, it should be read !(a[$0]++)), we return value (0 first time, 1-2-3.. next), then post increment and negate .... – Archemar 4 hours ago

1

@Archemar: Your answer indicate that ! is applied before ++. – Gnouc 4 hours ago

add a comment |

asked	today
viewed	222 times
active	today

current community

your communities

more stack exchange communities

How does awk '!a[$0]++' work?

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged shell-script awk scripting sort uniq or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

How does awk '!a[$0]++' work?

2 Answers 2

Did you find this question interesting? Try our newsletter

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged shell-script awk scripting sort uniq or ask your own question.

Related

Hot Network Questions