Yes, there definitely is.
You generate a collection of elements, mash it around and start pulling items out of it. A quick oneliner would be:
Enumerable.Range(0,100).OrderBy(x => Guid.NewGuid()).Take(20);
or alternatively
Enumerable.Range(0,100).OrderBy(x => random.Next()).Take(20);
This will give you 20 unique random values from the 0 to 100 range.
The difference with your approach is that you have a worst-case scenario of infinity: what if you are reaaaaally unlucky and end up with the same value constantly? You'll never get your required amount of random values.
My approach on the other hand will first generate 100 values and then take a subset from it which you can criticize on its memory impact. Considering you used random.Next()
, which uses half the integer range, you should in fact be wary of this since it will have a huge memory impact.
It also depends on your specific situation: if you have a very large pool of values (1.000.000) and you need 2 random values then your approach will be much better. But when you need 999.999 values from that same pool, my approach will be much better still be debatable.
It will take some time to generate those last values; a lot as you can test for yourself with this:
void Main()
{
var random = new Random();
var times = new TimeSpan[512];
var sw = new Stopwatch();
for(int i = 0; i < times.Length; i++)
{
sw.Restart();
while(true) {
int rand = random.Next();
if(rand < 7894512 && rand > 7894500) {
break;
}
}
sw.Stop();
times[i] = sw.Elapsed;
Console.WriteLine ("Elapsed time: " + sw.Elapsed);
}
var orderedTime = times.OrderBy(x => x);
for(int i = 0; i < 512; i++)
{
Console.WriteLine (orderedTime.ElementAt(i));
}
}
It will keep looping randomly 512 times through your list of values and consider the element found once it finds the (randomly picked out by myself) values between 7894500
and 7894512
. When you execute this you'll notice it takes a lot of time to find the last values (sometimes it's fast and it takes 39 ms, other times it takes over a minute). Evidently it's fast at the start and slow at the end.
Compare that to the memory overhead of my approach which will first allocate 32 million integers, guids, padding, object overhead, etc and you're out of a big chunk of memory.
You might be able to improve it by using a more "real" shuffling algorithm which doesn't have the object and guid overhead.
Ultimately in an extreme situation where you need 32 million unique values in a randomized order out of a total population of 32 million + 1, you will either have to accept a big memory overhead or a big execution time overhead.
One last edit before this topic can be laid to rest from my part: I talked about it with @rolfl in chat and we have come to the conclusion that either of our solutions have a usage but it depends on what your situation is exactly. Summarized it would come to this:
If you have a big range of values (like 0 to int.MaxValue) then my solution will eat any memory your PC has. You're looking at two collections with each 2.1 billion integers which you then take a slice from.
In my solution you first allocate this entire population, then you sort on it (different datastructure) and then you take some of it. If this "some" is not close to 2.1 billion you will have made a huge cost of allocating data without using it.
How does this compare to @rolfl's answer? He basically allocates the data as it is needed: if he needs 32 million values then he will only allocate those 32 million (x2) and not more. If he needs 2.1 billion then he'll end up with a memory footprint like I have but that's an uncommon worst case scenario while for me it is standard behaviour.
The main disadvantage of his approach is that when the amount of values you want reaches the total population, it will become increasingly harder to get those last values because there will be many collisions. Yet again, this will only be a problem when the population is actually really big.
So when should you use my solution? Like most things, there is a tradeoff between readability and performance. If you have a relatively small population and a large dataset then the readability weighs up against the performance impact, in my opinion. And when the population is relatively small and the amount of values we want is near that, my solution will have a memory impact similar to that of the other approach but it will also avoid many collisions.
Take your pick depending on what your situation is.