Generate random numbers without repetitions

Question

I want to generate a list of N different random numbers:

public static List<int> GetRandomNumbers(int count)
{
    List<int> randomNumbers = new List<int>(); 

    for (int i=0; i<count; i++) 
    {    
        int number;

        do number = random.Next();
        while (randomNumbers.Contains(number));

        randomNumbers.Add(number);
    }

    return randomNumbers;
}

But I feel there is a better way. This do...while loop especially makes this ugly. Any suggestions on how to improve this?

Ahah! I knew it, I knew that someday someone would need this! Your actual problem isn't to find random Numbers, your problem is about random shuffling numbers. They are two totally different problems. Randomness comes from things being random, so if the numbers don't repeat you can't actually call it random anymore, it will have the property of not repeating which is a very unique property! — Bruno Costa, yesterday
@BrunoCosta But if the numbers have a "random" order, why can't we say that they are random? — Kao, yesterday
Because a sequence of random numbers will eventually have a repeating number. You can say that they are randomly shuffled. But you can't say the sequence itself is random because the sequence does not contain repeating numbers. — Bruno Costa, yesterday
@MartinR Not exactly. Fisher–Yates shuffle gives you permutation of all elements in given range. — Kao, yesterday

Jeroen Vannevel · Answer 1 · 2014-08-28 17:17:38Z

Yes, there definitely is.

You generate a collection of elements, mash it around and start pulling items out of it. A quick oneliner would be:

Enumerable.Range(0,100).OrderBy(x => Guid.NewGuid()).Take(20);

or alternatively

Enumerable.Range(0,100).OrderBy(x => random.Next()).Take(20);

This will give you 20 unique random values from the 0 to 100 range.

The difference with your approach is that you have a worst-case scenario of infinity: what if you are reaaaaally unlucky and end up with the same value constantly? You'll never get your required amount of random values.

My approach on the other hand will first generate 100 values and then take a subset from it which you can criticize on its memory impact. Considering you used random.Next(), which uses half the integer range, you should in fact be wary of this since it will have a huge memory impact.

It also depends on your specific situation: if you have a very large pool of values (1.000.000) and you need 2 random values then your approach will be much better. But when you need 999.999 values from that same pool, my approach will ~~be much better~~ still be debatable.

It will take some time to generate those last values; a lot as you can test for yourself with this:

void Main()
{
    var random = new Random();
    var times = new TimeSpan[512];
    var sw = new Stopwatch();

    for(int i = 0; i < times.Length; i++) 
    {   
        sw.Restart();
        while(true) {
            int rand = random.Next();

            if(rand < 7894512 && rand > 7894500) {
                break;
            }
        }
        sw.Stop();
        times[i] = sw.Elapsed;
        Console.WriteLine ("Elapsed time: " + sw.Elapsed);
    }

    var orderedTime = times.OrderBy(x => x);
    for(int i = 0; i < 512; i++)
    {
        Console.WriteLine (orderedTime.ElementAt(i));
    }
}

It will keep looping randomly 512 times through your list of values and consider the element found once it finds the (randomly picked out by myself) values between 7894500 and 7894512. When you execute this you'll notice it takes a lot of time to find the last values (sometimes it's fast and it takes 39 ms, other times it takes over a minute). Evidently it's fast at the start and slow at the end.

Compare that to the memory overhead of my approach which will first allocate 32 million integers, guids, padding, object overhead, etc and you're out of a big chunk of memory.

You might be able to improve it by using a more "real" shuffling algorithm which doesn't have the object and guid overhead.

Ultimately in an extreme situation where you need 32 million unique values in a randomized order out of a total population of 32 million + 1, you will either have to accept a big memory overhead or a big execution time overhead.

One last edit before this topic can be laid to rest from my part: I talked about it with @rolfl in chat and we have come to the conclusion that either of our solutions have a usage but it depends on what your situation is exactly. Summarized it would come to this:

If you have a big range of values (like 0 to int.MaxValue) then my solution will eat any memory your PC has. You're looking at two collections with each 2.1 billion integers which you then take a slice from.

In my solution you first allocate this entire population, then you sort on it (different datastructure) and then you take some of it. If this "some" is not close to 2.1 billion you will have made a huge cost of allocating data without using it.

How does this compare to @rolfl's answer? He basically allocates the data as it is needed: if he needs 32 million values then he will only allocate those 32 million (x2) and not more. If he needs 2.1 billion then he'll end up with a memory footprint like I have but that's an uncommon worst case scenario while for me it is standard behaviour.

The main disadvantage of his approach is that when the amount of values you want reaches the total population, it will become increasingly harder to get those last values because there will be many collisions. Yet again, this will only be a problem when the population is actually really big.

So when should you use my solution? Like most things, there is a tradeoff between readability and performance. If you have a relatively small population and a large dataset then the readability weighs up against the performance impact, in my opinion. And when the population is relatively small and the amount of values we want is near that, my solution will have a memory impact similar to that of the other approach but it will also avoid many collisions.

Take your pick depending on what your situation is.

Excellent answer. If you want one million numbers and have added 999 999 of them, there's literary one in a million chance that the next number is the one you want. — Simon André Forsberg, yesterday
-1 because this code is not compatible with the OP's requirements as is, and would likely not complete successfully with a large range (MIN to MAX value for Int). — rolfl♦, yesterday
If you need 999,999 values from 1,000,000 then your code will require.... about 16MB just to store the GUID's (assuming it is stored in a complact 16-byte value, not as a string, or something) , plus another 4MB for the int values, plus object overheads, and then some more too, i am sure. — rolfl♦, yesterday
Additionally, GUID's are not specified to be random, and ordering by a GUID will not produce the results you expect — rolfl♦, yesterday
just replace it by a random and you wouldn't complain about it anymore @rofl Random r = new Random(); Enumerable.Range(0,100).OrderBy(x => r.Next()).Take(20).Dump(); — Bruno Costa, yesterday

Simon André Forsberg · Answer 2 · 2014-08-28 16:37:14Z

To break your problem down in to requirements:

you need a set of random numbers
the numbers need to be unique
the order of the returned numbers needs to be random

The problem you have in your code is that you are trying to do too much at once. If you break the problem down in to these stages, you will be able to solve things more elegantly.

Your do-while loop is conceptually OK, (it is OK to keep generating numbers if the result has been generated before), but the problem is that the randomNumbers.Contains(number) check operation is an \$O(n)\$ operation. The larger the data set, the slower it becomes. Because you do this operation 'count' times, and because the size of the list is dependent on count your contains system is... slower than it needs to be.

A better data structure at this point would be a Hashset. if you add a duplicate value to a HashSet, it does not grow, and the operation performance is not dependent on the amount of data in the HashSet. You can use the Count of the HashSet to manage the data uniqueness.

Once you have a clean set of unique random numbers, you can dump them in to a List, then shuffle the list using an efficient shuffle.

Combining these data structures, in the right way, leads to an \$O(n)\$ solution, which will scale fairly well.

Here is some code, which works in Ideone too. Note, my C# is weak, so I have tried to make the logic clear.

using System;
using System.Collections.Generic;

public class Test
{
    static Random random = new Random();

    public static List<int> GenerateRandom(int count)
    {
        // generate count random values.
        HashSet<int> candidates = new HashSet<int>();
        while (candidates.Count < count)
        {
            // May strike a duplicate.
            candidates.Add(random.Next());
        }

        // load them in to a list.
        List<int> result = new List<int>();
        result.AddRange(candidates);

        // shuffle the results:
        int i = result.Count;  
        while (i > 1)
        {  
            i--;  
            int k = random.Next(i + 1);  
            int value = result[k];  
            result[k] = result[i];  
            result[i] = value;  
        }  
        return result;
    }
    public static void Main()
    {
        List<int> vals = GenerateRandom(10);
        Console.WriteLine("Result: " + vals.Count);
        vals.ForEach(Console.WriteLine);
    }
}

Why do you shuffle the results? I'm confused about this because everyone seems to include this in their answers but it isn't in the OP's question — TopinFrassi, 20 hours ago
@TopinFrassi: Because rolfi ordered the numbers (to check for duplicates) but they shouldn't be ordered. — Charles, 19 hours ago
@TopinFrassi is right he should just return candidates and it would be fine. Removing shuffling part It's only slightly better then op's answer because it uses hashset wich makes it less expensive to find a duplicate. I said this was a problem about shuffling but of course there are many implementations to fit the requirement. — Bruno Costa, 16 hours ago
@BrunoCosta - The HashSet will impose a non-random (though unstructured) order on the data in the list. The OP's code would produce a random order on the result. The shuffle is required to retain the random ordered output. As for the 'slightly better' part, the HashSet will be about N/4 times faster than the List (where N is the count), so, for large N, say, 10,000 count, the HashSet will be 2500 or so times faster.... which, I guess, could be 'slight'. — rolfl♦, 16 hours ago
@rolfl Gotcha. I always forget the fact that hashsets dont guarantee any order. I didnt meant sligthly in order to depreciate the difference it may really have in a real case scenario. But you forgot to mention in your point that you need to make a one more O(n) operation, the shuffling. But obviously, the use of the hashset makes up for it really quickly. — Bruno Costa, 16 hours ago

TopinFrassi · Answer 3 · 2014-08-29 00:00:07Z

up vote 9 down vote

Instead of using a List<int>, you should use an HashSet<int>. The HashSet<> prohibites multiple identical values. And the Add method returns a bool that indicates if the element was added to the list, this way you could change this code :

public static List<int> GetRandomNumbers(int count)
{
    List<int> randomNumbers = new List<int>(); 

    for (int i=0; i<count; i++) 
    {    
        int number;

        do number = random.Next();
        while (randomNumbers.Contains(number));

        randomNumbers.Add(number);
    }

    return randomNumbers;
}

to this :

public static IEnumerable<int> GetRandomNumbers(int count)
{
    HashSet<int> randomNumbers = new HashSet<int>();

    for (int i = 0; i < count; i++) 
        while (!randomNumbers.Add(random.Next()));

    return randomNumbers;
}

Note that I changed the return value from List<int> to IEnumerable<int>, if you don't plan to add/remove values from your list, you should return IEnumerable<int>, if you plan to do it, return ICollection<int>. You shouldn't return a List<int> because it is an implementation detail and that the List<> isn't made to be extensible.

edited 15 hours ago

answered yesterday

TopinFrassi
858222

The problem about using a Set is that it's not ordered. There's no way to make it put the integers in a random order. – Simon André Forsberg yesterday

1

@SimonAndréForsberg The OP didn't specify he wanted to do so, unless there's something I'm missing – TopinFrassi yesterday

1

In fact, you are correct. I totally misread the problem. – Simon André Forsberg yesterday

Happens to everyone! – TopinFrassi yesterday

you shouldn't call ToList there. Asside of that it is a great solution – Bruno Costa 16 hours ago

| show 3 more comments

Dennis Jaheruddin · Answer 4 · 2014-08-28 14:09:55Z

up vote 4 down vote

Though the solution byr @Jeroen Vannevel is quite simple and works for most intents and purposes, you will see especially poor performance if you just want a few numbers out of a very large range.

In that case I recommend the following algorithm:

Pick one number out of the range
Pick one number out of the remaining range
and so on ...

The implementation could look like so:

Choose a number from 0 to N
Choose a number from 0 to N-1
And so on
Determine which number each of your drawings represents

Example for 0 to 10

draw 4
draw 7
...
Results in: the 4 represents 4, the 7 represents 8

answered yesterday

Dennis Jaheruddin
295110

Could you add a code example of your algo? – TopinFrassi yesterday

This will work, but will provide some additional overhead after you've randomized 5, 6 for example and you get an 7 as the next number. I assume you intend that 7 to be modified into a 9 then? (because 5 and 6 is already gone). – Simon André Forsberg yesterday

@SimonAndréForsberg Exactly, you would have to try when the overhead is worth the extra effort but when increasing the range there should be a turning point where this becomes more efficient. – Dennis Jaheruddin yesterday

1

I see the implication here, and that I mistook the algorithm you suggest. You are correct, there will be no skewness, but your answer leaves a very big blank over how to manage the range, especially given that the range is the whole valid int 32-bit space. You will still need to iterate over all the previously selected values to make it work. – rolfl♦ yesterday

1

This needs to be explained better. (The part that the second number is increased by 1 if it is bigger than the first, and the 3rd by 2 if bigger than both or 1 if bigger than one, etc.) – ANeves 22 hours ago

| show 3 more comments

Simon André Forsberg · Answer 5 · 2014-08-28 14:44:17Z

up vote 2 down vote

The algorithm looks fine to me.

I suggest you always use explicit braces around blocks even when they have a single statement:

do
{
    bar();
}
while (foo);

I would also rename the method to GetNRandomNumbers, to make it slightly clearer.

edited yesterday

Simon André Forsberg
22.2k444136

answered yesterday

ANeves
2,289421

add a comment |

Kris · Answer 6 · 2014-08-29 06:50:31Z

If there are 1-4 possible numbers, and you have generated 1 number already, that means there are (4 - 1) 3 possible numbers left. Make a random number between 3, for every generate number it is greater than or equal, increase the created number by 1.

lets say the number is 2, and you want to generate another:

Generated numbers: [1,2,3] Possible number: [1,3,4]

Solution:

public static int[] GetRandomNumbers(int count, int minValue, int maxValue)
{
    int[] randomNumbers = new int[count]; 

    for (int i=0; i < count; ++i) 
    {      
        //Given (min - max) is the range, there can only be (range - i) unique number left
        int number = random.Next(minValue, maxValue - i);

        //if the number is greater than or equal to, then increase the number
        for (int j=0; j < i; ++j)
            if(number >= randomNumbers[j])
                ++number; 

        randomNumbers.Add(number);
    }

    return randomNumbers;
}

Charles · Answer 7 · 2014-08-28 20:21:26Z

You have two parameters: the number N of integers you want, and their maximum size M >= N. If N is close to M, e.g., you want 500 numbers in the range 1..1000, then your best bet is to (partially) shuffle the numbers 1..M. For example, you can run the first N steps of a Fisher-Yates shuffle on 1..M and get the numbers you need.

Otherwise, if M is a lot bigger than N, you would be better off generating all your random numbers at once (rather than checking if they're in the list one-by-one), sorting, removing duplicates, adding new elements as needed for replacements, and then re-shuffling.

You may wish to generate more than N numbers to handle the expected number of duplications. The expected number of duplicates is about N(N-1)/2M so you might generate, say, N + 1.1N(N-1)/2M numbers in your initial list.

Sorting and removing duplicates is standard.

If you're not over-generating, or if you did but it wasn't enough, you can generate new elements as you originally suggested: generate a new number, test if it's on the list, and add it if so.

For re-shffling you can either use a standard shuffle (like Fisher-Yates, mentioned above) or you can store the original order with each element and then order by that. If you do and you over-generated originally, just ignore any extra elements on the end.

If performance is important, you can code both methods (and maybe several variants of the second, as described) and test to find the cutoff between the methods. If it isn't then the second method will work in all cases -- though if you don't need it the first is easier to program.

Jamal · Answer 8 · 2014-08-29 14:21:08Z

My recommendation:

Input: natural number n, number of random elements required
Output: an array a of size n with n unique random numbers

Algorithm:

Initialize an array to n elements
Walk that array in an ascending order and assign values that increase by a random amount from the previous one (whatever random function you may like)
Pick a natural number s. I would have to have another round at breaking my head to grasp how to chose this value the most efficiently, but right now I would recommend using just floor(n/2).
for i:= 0 to s-1 do
Pick 2 random numbers, l and r, that are both indices into the array a.
Swap a[l] with a[r]
Loop
Output array a

Memory consumption is a concern with this one, and it only works as an offline algorithm.

Alan King · Answer 9 · 2014-08-29 15:33:19Z

From the looks of it, what you want is called a (pseudo)random permutation of all int values. Note that you don't necessarily have to store these values in an array and shuffle them, consuming memory; you can actually generate them on demand via a "perfect" hash function of the bits in the "count" variable.

Note that pseudo-random also means deterministic, so to get different values on different runs, you have to make sure to "seed" your sequence with a different value each time.

asked	yesterday
viewed	1425 times
active	today

current community

your communities

more stack exchange communities

Generate random numbers without repetitions

9 Answers 9

Your Answer

Not the answer you're looking for? Browse other questions tagged c# random or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Generate random numbers without repetitions

9 Answers 9

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged c# random or ask your own question.

Related

Hot Network Questions