You are in for a treat, first of all a style review of existing code, followed by a code review of your current code, followed by a code refactoring, and topped of with a performance review of current solutions. All a part of the experience here at Code Review for free! Sit back, and enjoy. :-)
Style review
Actually it seems like you've done some coding before, and so most of the style is quite good. There are however some issues, and they are all related to naming:
- Class names is suggested to be CamelCase – And neither
coin
or trial
is good class names. When looking at the function they provide better names could possibly be CoinFlip
and RepeatedCoinFlip
- Don't use abbreviations in any names – What is it with
v_rand
or C_RAND_IDX
and most other variable names? When you have to guess what the variable is, then it is a bad choice of variable name
- Don't let a function name contain the parameter – It is quite clear that both the
flip10
and flip1000
are named after something which should be a parameter. This is not dynamic, and doesn't easily extend. What if you wanted the flip10
to flip it a hundred times?
- Bonus point for using
xrange
in Python 2.7
Code review
There are several code smells related to your code, sadly enough, which affects both readability as well as efficiency of your code. Lets address these:
- Letting the
__init__
do all the work – When you can't do anything with the class except what is done through the initialisation of the class, you should reconsider whether you are on the right path or not. In most cases this should be written as a simple function. If insisting on using class, you should strip down the initialisation, and let methods do the actual work
- Initialisation of
C_RAND_IDX
should most likely be in __init__
– I'm guessing, but I'm assuming that this should change for every instantation of the trial
class. In this case, it should not be a constant, or it should be set within the __init__
method
- Hardcoding of parameters – Having methods like
flip10
and flip1000
is going against what we have methods for. They should really have a parameter allowing for this number to change. You could have a default parameter, but it should be possible to change it
- Calculating coin frequency using arrays and float calculation is waste of memory and calculation power – In your current code there is no reason to store every coin toss in an array when you all you need is the sum of all coin tosses (which represent the tail coin tosses). You could easily sum it directly. (Taking it all the way, you could even do a
random.randit(0, coin.FLIP_TIMES)
instead of the sum :) )
- Simplify coin toss – To rebuild the
choices = [coin.HEADS, coin.TAILS]
and use random.choice()
seems like waisted energy. Why not simply use random.randint(0, 2)
which will give the same result?
- Why use classes, when methods suffice? – As already touched upon, your scenario doesn't seem to be a good case for classes as they only provide one single return in both cases. This could most likely be better handled in methods directly.
- Avoid extensive float calculations, if possible – Aritmethic with floats are more expensive then int arithmetic. Int comparision is also cheaper than float comparison. In other words, why compute the frequency as a float when the integer is just as good. You can always return the frequency as a float in the return statement. Which would reduce from 100 000 times * 1000 * 10 float operations, to 100 000 * 3 float operations. In total that is 333 333 less float division operations (and comparison and so on). That should be noticeable.
Code refactor
Lets give a pure methods implementation (before comparing some of these methods).
import random
def coin_flip(flip_count):
"""Returns how mains tails, 1, out flip_count coin flips"""
return sum(random.randint(0, 2) for _ in xrange(flip_count))
def fake_coin_flip(flip_count):
"""Returns a random tails count"""
return random.randint(0, flip_count+1)
def repeated_coin_flip(repetitions=1000,
flip_count=10,
coin_flip_function=coin_flip):
"""Repeat some frequencies from a large repetition of coin flip sequences.
Repeat <repetitions> of <flip_count> coin flips. Out of these return three
frequencies, namely the first, the minimum and a random frequency.
"""
minimum_count = repetitions + 1
random_index = random.randint(0, repetitions)
for i in xrange(repetitions):
count_of_tails = coin_flip_function(flip_count)
#print('coin_flip: {}'.format(count_of_tails))
if count_of_tails < minimum_count:
minimum_count = count_of_tails
if i == random_index:
random_count = count_of_tails
if i == 0:
first_count = count_of_tails
flip_count_as_float = float(flip_count)
return {'rand': random_count / flip_count_as_float,
'first': first_count / flip_count_as_float,
'min': minimum_count / flip_count_as_float }
Performance review
I've timed four solution based on answers so far: Original solution by compguy24, solution by SuperBiasedMan, and two variations of mine solution. In order to time them I've done a few adaptations, and only focus on the build of the data list. Here is the interesting part of the performance setup:
# Helper function to simulate the need of the full data set
# Could print out the first two, and last two elements
def print_some_data(data):
"""Print the two first and two last data elements"""
for start_element in data[:2]:
print(' first: {:,.4f}, min: {:,.4f}, rand: {:,.4f}'.format(
start_element['first'], start_element['min'], start_element['rand']))
print(' ... skipping loads of items ...')
for end_element in data[-2:]:
print(' first: {:,.4f}, min: {:,.4f}, rand: {:,.4f}'.format(
end_element['first'], end_element['min'], end_element['rand']))
# After renaming classes to Coin & Trial
def compguy24_solution(datasize=1000, print_data = True):
print(' Generating {} elements'.format(datasize))
data = [Trial().results for i in xrange(datasize)]
if print_data:
print_some_data(data)
def SuperBiasedMan_solution(datasize=1000, print_data=True):
print(' Generating {} elements'.format(datasize))
data = [trial() for _ in xrange(datasize)]
if print_data:
print_some_data(data)
def holroy_solution(datasize=1000, print_data=True):
print(' Generating {} elements'.format(datasize))
data = [repeated_coin_flip(1000, 10) for _ in xrange(datasize)]
if print_data:
print_some_data(data)
# Same as previous, but faking the coin_flip :-D
def holroy_solution_v2(datasize=1000, print_data=True):
print(' Generating {} elements'.format(datasize))
data = [repeated_coin_flip(1000, 10, fake_coin_flip) for _ in xrange(datasize)]
if print_data:
print_some_data(data)
def main():
test_case = "from {0} import {1}; {1}({2}, False)"
for test_function in ('compguy24_solution',
'SuperBiasedMan_solution',
'holroy_solution',
'holroy_solution_v2',):
print ('\nTesting {}'.format(test_function))
datasize = 1000
print(' execution time: {:,.4f} seconds'.format(
timeit.timeit(test_case.format(__name__, test_function, datasize),
number=1)))
if __name__ == '__main__':
main()
Updated: I did a few test runs to compare the different solutions, before doing a final test run of 100 000 times (which I let run over night). With a 1000 samples I found the following: Original code (58.6 seconds) and the version by SuperBiasedMan (56.5 seconds) runs in around a minute, whilst my version using int comparison and simpler random function runs in only 2.8 seconds (about 20 times faster). ( And if faking the coin flip, it only takes 0.5 seconds to complete it. :-) )
But here are the ultimate test for a 100 000 times run on my computer:
Testing compguy24_solution
Generating 100000 elements
execution time: 5,600.1455 seconds
Testing SuperBiasedMan_solution
Generating 100000 elements
execution time: 5,565.2747 seconds
Testing holroy_solution
Generating 100000 elements
execution time: 277.3780 seconds
Testing holroy_solution_v2
Generating 100000 elements
execution time: 43.8776 seconds
That is the original and the solution by SuperBiasedMan took around 1.5 hours to complete, whilst my solution took 4.5 minutes. (The fake coin_flip variant under a minute). I think it is somewhat clear which solution I would prefer! Sorry guys! :-D
choices = [coin.HEADS, coin.TAILS
to global scope gives slight performance improvement. However, since your loop counts are constants, I suspect operating on a single large array may be faster than an object oriented solution. \$\endgroup\$