Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I am thinking if there exists any name for such a simple function which returns the order of numbers in an array. I would really love to do this ranking by minimalist way and with basic Unix commands but I cannot get anything to my mind than basic find-and-loop which is not so elegant. Assume you have an array of numbers

17 
94 
3 
52 
4 
4 
9

Expected output where duplicates just receive the same ID; how to handle duplicates is not critical so feel to take shortcuts:

4 
6 
1 
5 
2 
2 
3        

Motivation: I saw today many users using many different ways to solve this problem and doing much manual steps with Spreadsheet; so I started to think the minimalist way to do it.

Comparing the ranking algorithm to Google's Average ranking

In Google Spreadsheet, do =arrayformula(rank.AVG(A:A,A:A,true)) and you get as a benchmark as ascending order like the first expected output

17  5
94  7
3   1
52  6
4   2.5
4   2.5
9   4

where you see that my initial ranking algorithm is biased. I think to be able to set the dataset location would be helpful here.

share|improve this question
1  
Apart from being in reverse order, the minor "biased" difference comes from counting duplicated items as 2 places instead of one. – JJoao 2 days ago
up vote 10 down vote accepted

If that list was in a file, one per line, I'd do something like:

sort -nu file |
  awk 'NR == FNR {rank[$0] = NR; next}
      {print rank[$0]}' - file

If it was in a zsh $array:

sorted=(${(nou)array})
for i ($array) echo $sorted[(i)$i]

That's the same principle as for the awk version above, the rank is the index NR/(i) in the numerically (-n/(n)) ordered (sort/(o)), uniqued (-u/(u)) list of elements.

For your average rank:

sort -n file |
  awk 'NR == FNR {rank[$0] += NR; n[$0]++; next}
  {print rank[$0] / n[$0]}' - file

Which gives:

5
7
1
6
2.5
2.5
4

(use sort -rn to reverse the order like in your Google Spreadsheet version).

share|improve this answer
    
Please, see the benchmark of Google's average ranking in the body. Maybe, it can simplify your proposal. To be able to set the dataset location to get biased and/or unbiased would be great. – Masi 2 days ago
2  
@Masi, see edit for average ranking. I don't follow your sentence about biased/unbiased and dataset location. Possibly your question needs more context. – Stéphane Chazelas 2 days ago
nl x | sort  -k 2n | nl | sort -k 2n | cut -f1

... it has a slightly different behavior in case of duplicates:

 nl x | sort  -k 2n | nl | sort -k 2n | cut -f1,3
 5  17 
 7  94 
 1  3 
 6  52 
 2  4 
 3  4 
 4  9
share|improve this answer
    
Please, see the benchmark of Google's average ranking in the body. Maybe, it can simplify your proposal. To be able to set the dataset location to get biased and/or unbiased would be great. – Masi 2 days ago
1  
@masi, appart duplicates, you get the Google's ranking if you just sort in reverse order: nl x | sort -k 2rn | nl | sort -k 2n | cut -f1,3 – JJoao 2 days ago

With just GNU awk:

awk '
    FNR == NR {numbers[$1]=1; next} 
    FNR == 1 {
        n = asorti(numbers, sorted, "@ind_num_asc")
        for (i=1; i<=n; i++) rank[sorted[i]] = i
    }
    {print rank[$1]}
' file file
share|improve this answer
2  
Note that numbers[$1]=1 can be simplified to just numbers[$1] as you don't care about the values of that hash. – Stéphane Chazelas 2 days ago
3  
I find that too obscure for my tastes, and prefer an assignment to create the array element. – glenn jackman 2 days ago
1  
@masi, how is that "google average ranking" implanted? I strongly suspect it will not simplify my code at all – glenn jackman 2 days ago
    
On the other hand, I did wonder what was the significance of assigning the value 1 to the array. For me, h[key] is idiomatic of assigning the key of a hash, while a[key]=1 would be idiomatic of giving a true value to the hash element of by that key. – Stéphane Chazelas 2 days ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.