How slow is python really? (Or how fast is your language?)

Question

I have this code which I have written in python/numpy

from __future__ import division
import numpy as np
import itertools

n=6
iters = 1000
firstzero = 0
bothzero = 0
""" The next line iterates over arrays of length n+1 which contain only -1s and 1s """
for S in itertools.product([-1,1], repeat = n+1):
    """For i from 0 to iters -1 """
    for i in xrange(iters):
        """ Choose a random array of length n.
            Prob 1/4 of being -1, prob 1/4 of being 1 and prob 1/2 of being 0. """
        F = np.random.choice(np.array([-1,0,0,1], dtype=np.int8), size = n)
        """The next loop just makes sure that F is not all zeros."""
        while np.all(F ==0):
            F = np.random.choice(np.array([-1,0,0,1], dtype=np.int8), size = n)
        """np.convolve(F,S,'valid') computes two inner products between
        F and the two successive windows of S of length n."""
        FS = np.convolve(F,S, 'valid')
        if (FS[0] == 0):
            firstzero += 1
        if np.all(FS==0):
            bothzero += 1

print "firstzero",    firstzero
print "bothzero",  bothzero

It is counting the number of times the convolution of two random arrays, one which is one longer than the other, with a particular probability distribution, has a 0 at the first position or a 0 in both positions.

I had a bet with a friend who says python is a terrible language to write code in that needs to be fast. It takes 9s on my computer. He says it could be made 100 times faster if written in a "proper language".

The challenge is to see if this code can indeed by made 100 times faster in any language of your choice. I will test your code and the fastest one week from now will win. If anyone gets below 0.09s then they automatically win and I lose.

Status A number of amazing answers. I will get them running on my machine as soon as possible to see if we already have a winner.

My Machine The timings will be run on my machine. This is a standard ubuntu install on an AMD FX-8350 Eight-Core Processor. This also means I need to be able to run your code.

Also, I'm pretty sure that you could blow this out of the water using OpenCL/CUDA. But my GPU programming skills are nonexistent, so I defer that to someone else. — nneonneo, yesterday
I love python. It's my favourite language, but don't let anyone mislead you into thinking it's fast. In my eyes, choosing Python over a language like C++ or something is choosing developer time over computation time. — undergroundmonorail, yesterday
@Manishearth CPython is interpreted. Don't confuse the language with the implementation. ;) — Terry Chia, yesterday
A language doesn't have a speed. It's like asking how fast is english? Runtime on a specific machine can have a speed but runtime != language. — Sylwester, 22 hours ago
"Languages don't have a speed" is like saying "running isn't faster than driving a car." — Venge, 14 hours ago

ace · Answer 1 · 2014-04-27 16:44:30Z

up vote 32 down vote

Python 2.7 - 0.882s

(OP's original: 6.404s)

import itertools
import operator
import random

n=6
iters = 1000
firstzero = 0
bothzero = 0

for S in itertools.product([-1,1], repeat = n+1):
    for i in xrange(iters):
        F = [random.choice([-1,0,0,1]) for j in xrange(n)]
        while not any(F):
            F = [random.choice([-1,0,0,1]) for j in xrange(n)]
        if not sum(map(operator.mul, F, S[:-1])):
            firstzero += 1
            if not sum(map(operator.mul, F, S[1:])):
                bothzero += 1

print "firstzero", firstzero
print "bothzero", bothzero

OP's original code uses such tiny arrays there is no benefit to using Numpy, as this pure python implementation demonstrates.

I also optimize by skipping the rest of the convolution if the first result isn't zero.

edited 22 hours ago

ace
4,88111047

answered yesterday

Alistair Buxton
55115

1

This approach also allows you to use pypy! – Lembik yesterday

5

With pypy this runs in about 0.5 seconds. – Alistair Buxton yesterday

You get a much more convincing speedup if you set n = 10. I get 19s versus 4.6s for cpython versus pypy. – Lembik 23 hours ago

Another optimization would be to precompute the possiblities for F because there are only 4032 of them. Define choicesF = filter(any, itertools.product([-1, 0, 0, 1], repeat=n)) outside of the loops. Then in the innerloop define F = random.choice(choicesF). I get a 3x speedup with such an approach. – Steven Rumbalski 2 hours ago

add comment

Kyle Kanos · Answer 2 · 2014-04-27 19:59:20Z

Python2.7 + Numpy 1.8.1: 10.242 s

Fortran 90+: 0.029 s 0.003 s 0.022 s

Damn straight you lost your bet! Not a drop of parallelization here too, just straight Fortran 90+.

EDIT I've taken Guy Sirton's algorithm for permuting the array S (good find :D). I apparently also had the -g -traceback compiler flags active which were slowing this code down to about 0.017s. Currently, I am compiling this as

ifort -fast -o convolve convolve_random_arrays.f90

For those who don't have ifort, you can use

gfortran -O3 -ffast-math -o convolve convolve_random_arrays.f90

EDIT 2: The decrease in run-time is because I was doing something wrong previously and got an incorrect answer. Doing it the right way is apparently slower. I still can't believe that C++ is faster than mine, so I'm probably going to spend some time this week trying to tweak the crap out of this to speed it up.

Here's the code:

program convolve_random_arrays
   implicit none
   integer :: n, p, pmax, iters, i, nil(0:1)
   integer, allocatable ::  F(:), S(:), FS(:)

   n = 6
   allocate(F(n), S(n+1), FS(2))
   iters = 1000
   nil = 0

   call init_random_seed()

   S = -1
   pmax = 2**(n+1)
   do p=1,pmax
      do i=1,iters

         do! while(all(F==0))
            F = rand_int_array(n)
            if(.not.all(f==0)) exit
         enddo

         FS = convolve(F,S)

         if(FS(1) == 0) then
            nil(0) = nil(0) + 1
            if(FS(2) == 0) nil(1) = nil(1) + 1
         endif

      enddo
      call permute(S)
   enddo

   print *,"first zero:",nil(0)
   print *," both zero:",nil(1)

 contains
   pure function convolve(x, h) result(y)
!x is the signal array
!h is the noise/impulse array
      integer, dimension(:), intent(in) :: x, h
      integer, dimension(abs(size(x)-size(h))+1) :: y
      integer:: i, j, r
      do i=1,size(y)
         r = 0
         do j=1,size(x)
            r = r+ x(j)*h(j+i-1)
         enddo
         y(i) = r
      enddo
   end function convolve

   pure subroutine permute(x)
      integer, intent(inout) :: x(:)
      integer :: i

      do i=1,size(x)
         if(x(i)==-1) then
            x(i) = 1
            return
         endif
         x(i) = -1
      enddo
   end subroutine permute

   function rand_int_array(i) result(x)
     integer, intent(in) :: i
     integer :: x(i), j
     real :: y
     do j=1,i
        call random_number(y)
        if(y <= 0.25) then
           x(j) = -1
        else if (y >= 0.75) then
           x(j) = +1
        else
           x(j) = 0
        endif
     enddo
   end function rand_int_array

   subroutine init_random_seed()
     integer :: i, n, clock
     integer, allocatable :: seed(:)

     call random_seed(size=n)
     allocate(seed(n))
     call system_clock(count=clock)
     seed = clock + 37*[ (i-1, i=1,n) ]
     call random_seed(put=seed)

     deallocate(seed)
   end subroutine init_random_seed
end program convolve_random_arrays

I suppose the question now is will you stop using slow-as-molasses Python and use fast-as-electrons-can-move Fortran ;).

Wouldn't the case statement be faster than a generator function anyway? Unless you're expecting some kind of branch-prediction/cache-line/etc speedup? — OrangeDog, yesterday
Speed should be compared on the same machine. What runtime did you get for the OP's code? — nbubis, yesterday
Though not specified in the question, I think the point is that generating the S array should not depend on what n is, i.e. your algorithm should work for all n, it's just that in this test case n happens to be 6, so I don't think that hard-coding the arrays is a good idea — ace, yesterday
This also doesn't generate the random array correctly. OP's code checks that at least one value in the random array is none-zero, and if it isn't, it generates a new random array. Don't ask me why it does this, but it does. — Alistair Buxton, yesterday

Guy Sirton · Answer 3 · 2014-04-27 20:23:40Z

C++ (VS 2012) - ~~0.026s~~ 0.015s

Python 2.7.6/Numpy 1.8.1 - 12s

Speedup ~x800.

The gap would be a lot smaller if the convolved arrays were very large...

#include <vector>
#include <iostream>
#include <ctime>

using namespace std;

static unsigned int seed = 35;

int my_random()
{
   seed = seed*1664525UL + 1013904223UL; // numerical recipes, 32 bit

   switch((seed>>30) & 3)
   {
   case 0: return 0;
   case 1: return -1;
   case 2: return 1;
   case 3: return 0;
   }
   return 0;
}

bool allzero(const vector<int>& T)
{
   for(auto x : T)
   {
      if(x!=0)
      {
         return false;
      }
   }
   return true;
}

void convolve(vector<int>& out, const vector<int>& v1, const vector<int>& v2)
{
   for(size_t i = 0; i<out.size(); ++i)
   {
      int result = 0;
      for(size_t j = 0; j<v2.size(); ++j)
      {
         result += v1[i+j]*v2[j];
      }
      out[i] = result;
   }
}

void advance(vector<int>& v)
{
   for(auto &x : v)
   {
      if(x==-1)
      {
         x = 1;
         return;
      }
      x = -1;
   }
}

void convolve_random_arrays(void)
{
   const size_t n = 6;
   const int two_to_n_plus_one = 128;
   const int iters = 1000;
   int bothzero = 0;
   int firstzero = 0;

   vector<int> S(n+1);
   vector<int> F(n);
   vector<int> FS(2);

   time_t current_time;
   time(&current_time);
   seed = current_time;

   for(auto &x : S)
   {
      x = -1;
   }
   for(int i=0; i<two_to_n_plus_one; ++i)
   {
      for(int j=0; j<iters; ++j)
      {
         do
         {
            for(auto &x : F)
            {
               x = my_random();
            }
         } while(allzero(F));
         convolve(FS, S, F);
         if(FS[0] == 0)
         {
            firstzero++;
            if(FS[1] == 0)
            {
               bothzero++;
            }
         }
      }
      advance(S);
   }
   cout << firstzero << endl; // This output can slow things down
   cout << bothzero << endl; // comment out for timing the algorithm
}

A few notes:

The random function is being called in the loop so I went for a very light weight linear congruential generator (but generously looked at the MSBs).
This is really just the starting point for an optimized solution.
Didn't take that long to write...
I iterate through all the values of S taking S[0] to be the "least significant" digit.

Add this main function for a self contained example:

int main(int argc, char** argv)
{
  for(int i=0; i<1000; ++i) // run 1000 times for stop-watch
  {
      convolve_random_arrays();
  }
}

Indeed. The tiny size of the arrays in OP's code means using numpy is actually an order of magnitude slower than straight python. — Alistair Buxton, yesterday
Very nice! I've increased the speed-up on my code because of your advance function, so my code is now faster than yours :P (but very good competition!) — Kyle Kanos, yesterday
I can't get this to compile. It says "error: ‘x’ does not name a type for(auto x : T) " and many other errors. How do you compile it? — Lembik, 23 hours ago
Can anyone else compile this code? I can't compile it at all. — Lembik, 20 hours ago

ace · Answer 4 · 2014-04-27 11:21:16Z

C

Takes ~ 0.032s (or ~ 0.030s with -O flag of gcc) on my machine, with OP's original code taking ~ 7.7s. Tried to optimize by generating the random array and convolving in the same loop, but it doesn't seem to make a lot of difference.

The first array is generated by taking an integer, write it out in binary, and change all 1 to -1 and all 0 to 1. The rest should be very straightforward.

Edit: instead of having n as an int, now we have n as a macro-defined constant, so we can use int arr[n]; instead of malloc.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define n (6)
#define iters (1000)
int main() {
    int firstzero=0, bothzero=0;
    int arr[n+1];
    unsigned int i, j;
    srand(time(NULL));

    for(i=0; i< 1<<(n+1) ; i++) {
        unsigned int tmp=i;
        for(j=0; j<n+1; j++) {
            arr[j]=(tmp&1)*(-2)+1;
            tmp>>=1;
        }
        for(j=0; j<iters; j++) {
            int randArr[n];
            unsigned int k, flag=0;
            int first=0, second=0;
            for(k=0; k<n; k++) {
                randArr[k]=rand()%4;
                if(randArr[k]==2) randArr[k]=0;
                if(randArr[k]==3) randArr[k]=-1;
                if(randArr[k]) flag++;
                if(k==n-1&&!flag) {
                    k=-1;
                    continue;
                }
                else {
                    first+=arr[k]*randArr[k];
                    second+=arr[k+1]*randArr[k];
                }
            }
            firstzero+=(!first);
            bothzero+=(!first&&!second);
        }
    }
    printf("firstzero %d\nbothzero %d\n", firstzero, bothzero);
    return 0;
}

I tested this. It is very fast (try n = 10) and gives correct looking output. Thank you. — Lembik, 22 hours ago
This implementation doesn't follow the original because if the random vector is all zeros only the last element would be re-generated. In the original the entire vector would be. You need to enclose that loop in do{}while(!flag) or something to that effect. I don't expect it will change the run-time much (may make it faster). — Guy Sirton, 17 hours ago
@Guy Sirton Notice that before the continue; statement I assigned -1 to k, so k will loop from 0 again. — ace, 16 hours ago
@ace ah! you're right. I was scanning too quickly and it looked like that was -= rather than =- :-) A while loop would be more readable. — Guy Sirton, 16 hours ago

dbaupp · Answer 5 · 2014-04-28 01:23:17Z

Rust: 0.011s

Original Python: 8.3

A straight translation of the original Python.

extern crate rand;

use rand::Rng;

static N: uint = 6;
static ITERS: uint = 1000;

fn convolve<T: Num>(into: &mut [T], a: &[T], b: &[T]) {
    // we want `a` to be the longest array
    if a.len() < b.len() {
        convolve(into, b, a);
        return
    }

    assert_eq!(into.len(), a.len() - b.len() + 1);

    for (n,place) in into.mut_iter().enumerate() {
        for (x, y) in a.slice_from(n).iter().zip(b.iter()) {
            *place = *place + *x * *y
        }
    }
}

fn main() {
    let mut first_zero = 0;
    let mut both_zero = 0;
    let mut rng = rand::XorShiftRng::new().unwrap();

    for s in PlusMinus::new() {
        for _ in range(0, ITERS) {
            let mut f = [0, .. N];
            while f.iter().all(|x| *x == 0) {
                for p in f.mut_iter() {
                    match rng.gen::<u32>() % 4 {
                        0 => *p = -1,
                        1 | 2 => *p = 0,
                        _ => *p = 1
                    }
                }
            }

            let mut fs = [0, .. 2];
            convolve(fs, s, f);

            if fs[0] == 0 { first_zero += 1 }
            if fs.iter().all(|&x| x == 0) { both_zero += 1 }
        }
    }

    println!("{}\n{}", first_zero, both_zero);
}



/// An iterator over [+-]1 arrays of the appropriate length
struct PlusMinus {
    done: bool,
    current: [i32, .. N + 1]
}
impl PlusMinus {
    fn new() -> PlusMinus {
        PlusMinus { done: false, current: [-1, .. N + 1] }
    }
}

impl Iterator<[i32, .. N + 1]> for PlusMinus {
    fn next(&mut self) -> Option<[i32, .. N+1]> {
        if self.done {
            return None
        }

        let ret = self.current;

        // a binary "adder", that just adds one to a bit vector (where
        // -1 is the zero, and 1 is the one).
        for (i, place) in self.current.mut_iter().enumerate() {
            *place = -*place;
            if *place == 1 {
                break
            } else if i == N {
                // we've wrapped, so we want to stop after this one
                self.done = true
            }
        }

        Some(ret)
    }
}

Compiled with --opt-level=3
My rust compiler is a recent nightly: (rustc 0.11-pre-nightly (eea4909 2014-04-24 23:41:15 -0700) to be precise)

I got it to compile using the nightly version of rust. However I think the code is wrong. The output should be something close to firstzero 27215 bothzero 12086. Instead it gives 27367 6481 — Lembik, 20 hours ago
@Lembik, whoops, got my as and bs mixed up in the convolution; fixed (doesn't change the runtime noticably). — dbaupp, 14 hours ago

algorithmshark · Answer 6 · 2014-04-27 00:33:18Z

J

I don't expect to beat out any compiled languages, and something tells me it'd take a miraculous machine to get less than 0.09 s with this, but I'd like to submit this J anyway, because it's pretty slick.

NB. constants
num =: 6
iters =: 1000

NB. convolve
NB. take the multiplication table                */
NB. then sum along the NE-SW diagonals           +//.
NB. and keep the longest ones                    #~ [: (= >./) #/.
NB. operate on rows of higher dimensional lists  " 1
conv =: (+//. #~ [: (= >./) #/.) @: (*/) " 1

NB. main program
S  =: > , { (num+1) # < _1 1                NB. all {-1,1}^(num+1)
F  =: (3&= - 0&=) (iters , num) ?@$ 4       NB. iters random arrays of length num
FS =: ,/ S conv/ F                          NB. make a convolution table
FB =: +/ ({. , *./)"1 ] 0 = FS              NB. first and both zero
('first zero ',:'both zero ') ,. ":"0 FB    NB. output results

This takes about 0.5 s on a laptop from the previous decade, only about 20x as fast as the Python in the answer. Most of the time is spent in conv because we write it lazily (we compute the entire convolution) and in full generality.

Since we know things about S and F, we can speed things up by making specific optimizations for this program. The best I've been able to come up with is conv =: ((num, num+1) { +//.)@:(*/)"1—select specifically the two numbers that correspond from the diagonal sums to the longest elements of the convolution—which approximately halves the time.

ace · Answer 7 · 2014-04-27 16:46:47Z

Perl - 89% improvement

On my ancient netbook, the OP's code takes 53 seconds to run; Alistair Buxton's version takes about 6.5 seconds, and the following Perl version takes about 5.7 seconds.

use v5.10;
use strict;
use warnings;

use Algorithm::Combinatorics qw( variations_with_repetition );
use List::Util qw( any sum );
use List::MoreUtils qw( pairwise );

my $n         = 6;
my $iters     = 1000;
my $firstzero = 0;
my $bothzero  = 0;

my $variations = variations_with_repetition([-1, 1], $n+1);
while (my $S = $variations->next)
{
  for my $i (1 .. $iters)
  {
    my @F;
    until (@F and any { $_ } @F)
    {
      @F = map +((-1,0,0,1)[rand 4]), 1..$n;
    }

    # The pairwise function doesn't accept array slices,
    # so need to copy into a temp array @S0
    my @S0 = @$S[0..$n-1];

    unless (sum pairwise { $a * $b } @F, @S0)
    {
      $firstzero++;
      my @S1 = @$S[1..$n];  # copy again :-(
      $bothzero++ unless sum pairwise { $a * $b } @F, @S1;
    }
  }
}

say "firstzero ", $firstzero;
say "bothzero ", $bothzero;

Thethos · Answer 8 · 2014-04-28 12:06:10Z

MATLAB 0.024s

Original Code: ~ 3.3 s

Alistar Buxton's Code: ~ 0.51 s

Matlab Code: ~ 0.024 s (Matlab already running)

I decided to give the oh so slow Matlab a try. If you know how, you can get rid of most of the loops (in Matlab), which makes it quite fast. However, the memory requirements are higher than for looped solutions but this will not be an issue if you don't have very large arrays...

function call_convolve_random_arrays
tic
convolve_random_arrays
toc
end

function convolve_random_arrays

n = 6;
iters = 1000;
firstzero = 0;
bothzero = 0;

rnd = [-1, 0, 0, 1];

S = -1 *ones(1, n + 1);

IDX1 = 1:n;
IDX2 = IDX1 + 1;

for i = 1:2^(n + 1)
    F = rnd(randi(4, [iters, n]));
    sel = ~any(F,2);
    while any(sel)
        F(sel, :) = rnd(randi(4, [sum(sel), n]));
        sel = ~any(F,2);
    end

    sum1 = F * S(IDX1)';
    sel = sum1 == 0;
    firstzero = firstzero + sum(sel);

    sum2 = F(sel, :) * S(IDX2)';
    sel = sum2 == 0;
    bothzero = bothzero + sum(sel);

    S = permute(S); 
end

fprintf('firstzero %i \nbothzero %i \n', firstzero, bothzero);

end

function x = permute(x)

for i=1:length(x)
    if(x(i)==-1)
        x(i) = 1;
            return
    end
        x(i) = -1;
end

end

Here is what I do:

use Kyle Kanos function to permute through S
calculate all n * iters random numbers at once
map 1 to 4 to [-1 0 0 1]
use Matrix multiplication (elementwise sum(F * S(1:5)) is equal to matrix multiplication of F * S(1:5)'
for bothzero: only calculate members that fullfill the first condition

I assume you don't have matlab, which is too bad as I really would have liked to see how it compares...

(The function can be slower the first time you run it.)

Well I have octave if you can make it work for that...? – Lembik 3 hours ago
I can give it a try - I never worked with octave, though. – Thethos 2 hours ago

Bob · Answer 9 · 2014-04-27 11:00:24Z

C# 0.135s

C# based on Alistair Buxton's plain python: 0.278s
Parallelised C#: 0.135s
Python from the question: 5.907s
Alistair's plain python: 0.853s

I'm not actually certain this implementation is correct - its output is different, if you look at the results down at the bottom.

There's certainly more optimal algorithms. I just decided to use a very similar algorithm to the Python one.

Single-threaded C

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConvolvingArrays
{
    static class Program
    {
        static void Main(string[] args)
        {
            int n=6;
            int iters = 1000;
            int firstzero = 0;
            int bothzero = 0;

            int[] arraySeed = new int[] {-1, 1};
            int[] randomSource = new int[] {-1, 0, 0, 1};
            Random rand = new Random();

            foreach (var S in Enumerable.Repeat(arraySeed, n+1).CartesianProduct())
            {
                for (int i = 0; i < iters; i++)
                {
                    var F = Enumerable.Range(0, n).Select(_ => randomSource[rand.Next(randomSource.Length)]);
                    while (!F.Any(f => f != 0))
                    {
                        F = Enumerable.Range(0, n).Select(_ => randomSource[rand.Next(randomSource.Length)]);
                    }
                    if (Enumerable.Zip(F, S.Take(n), (f, s) => f * s).Sum() == 0)
                    {
                        firstzero++;
                        if (Enumerable.Zip(F, S.Skip(1), (f, s) => f * s).Sum() == 0)
                        {
                            bothzero++;
                        }
                    }
                }
            }

            Console.WriteLine("firstzero {0}", firstzero);
            Console.WriteLine("bothzero {0}", bothzero);
        }

        // itertools.product?
        // http://ericlippert.com/2010/06/28/computing-a-cartesian-product-with-linq/
        static IEnumerable<IEnumerable<T>> CartesianProduct<T>
            (this IEnumerable<IEnumerable<T>> sequences)
        {
            IEnumerable<IEnumerable<T>> emptyProduct =
              new[] { Enumerable.Empty<T>() };
            return sequences.Aggregate(
              emptyProduct,
              (accumulator, sequence) =>
                from accseq in accumulator
                from item in sequence
                select accseq.Concat(new[] { item }));
        }
    }
}

Parallel C#:

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace ConvolvingArrays
{
    static class Program
    {
        static void Main(string[] args)
        {
            int n=6;
            int iters = 1000;
            int firstzero = 0;
            int bothzero = 0;

            int[] arraySeed = new int[] {-1, 1};
            int[] randomSource = new int[] {-1, 0, 0, 1};

            ConcurrentBag<int[]> results = new ConcurrentBag<int[]>();

            // The next line iterates over arrays of length n+1 which contain only -1s and 1s
            Parallel.ForEach(Enumerable.Repeat(arraySeed, n + 1).CartesianProduct(), (S) =>
            {
                int fz = 0;
                int bz = 0;
                ThreadSafeRandom rand = new ThreadSafeRandom();
                for (int i = 0; i < iters; i++)
                {
                    var F = Enumerable.Range(0, n).Select(_ => randomSource[rand.Next(randomSource.Length)]);
                    while (!F.Any(f => f != 0))
                    {
                        F = Enumerable.Range(0, n).Select(_ => randomSource[rand.Next(randomSource.Length)]);
                    }
                    if (Enumerable.Zip(F, S.Take(n), (f, s) => f * s).Sum() == 0)
                    {
                        fz++;
                        if (Enumerable.Zip(F, S.Skip(1), (f, s) => f * s).Sum() == 0)
                        {
                            bz++;
                        }
                    }
                }

                results.Add(new int[] { fz, bz });
            });

            foreach (int[] res in results)
            {
                firstzero += res[0];
                bothzero += res[1];
            }

            Console.WriteLine("firstzero {0}", firstzero);
            Console.WriteLine("bothzero {0}", bothzero);
        }

        // itertools.product?
        // http://ericlippert.com/2010/06/28/computing-a-cartesian-product-with-linq/
        static IEnumerable<IEnumerable<T>> CartesianProduct<T>
            (this IEnumerable<IEnumerable<T>> sequences)
        {
            IEnumerable<IEnumerable<T>> emptyProduct =
              new[] { Enumerable.Empty<T>() };
            return sequences.Aggregate(
              emptyProduct,
              (accumulator, sequence) =>
                from accseq in accumulator
                from item in sequence
                select accseq.Concat(new[] { item }));
        }
    }

    // http://stackoverflow.com/a/11109361/1030702
    public class ThreadSafeRandom
    {
        private static readonly Random _global = new Random();
        [ThreadStatic]
        private static Random _local;

        public ThreadSafeRandom()
        {
            if (_local == null)
            {
                int seed;
                lock (_global)
                {
                    seed = _global.Next();
                }
                _local = new Random(seed);
            }
        }
        public int Next()
        {
            return _local.Next();
        }
        public int Next(int maxValue)
        {
            return _local.Next(maxValue);
        }
    }
}

Test output:

Windows (.NET)

The C# is much faster on Windows. Probably because .NET is faster than mono.

User and sys timing doesn't seem to work (used git bash for timing).

$ time /c/Python27/python.exe numpypython.py
firstzero 27413
bothzero 12073

real    0m5.907s
user    0m0.000s
sys     0m0.000s
$ time /c/Python27/python.exe plainpython.py
firstzero 26983
bothzero 12033

real    0m0.853s
user    0m0.000s
sys     0m0.000s
$ time ConvolvingArrays.exe
firstzero 28526
bothzero 6453

real    0m0.278s
user    0m0.000s
sys     0m0.000s
$ time ConvolvingArraysParallel.exe
firstzero 28857
bothzero 6485

real    0m0.135s
user    0m0.000s
sys     0m0.000s

Linux (mono)

bob@phoebe:~/convolvingarrays$ time python program.py
firstzero 27059
bothzero 12131

real    0m11.932s
user    0m11.912s
sys     0m0.012s
bob@phoebe:~/convolvingarrays$ mcs -optimize+ -debug- program.cs
bob@phoebe:~/convolvingarrays$ time mono program.exe
firstzero 28982
bothzero 6512

real    0m1.360s
user    0m1.532s
sys     0m0.872s
bob@phoebe:~/convolvingarrays$ mcs -optimize+ -debug- parallelprogram.cs
bob@phoebe:~/convolvingarrays$ time mono parallelprogram.exe
firstzero 28857
bothzero 6496

real    0m0.851s
user    0m2.708s
sys     0m3.028s

I don't think the code is correct as you say. The outputs are not right. — Lembik, 22 hours ago
@Lembik Yea. I'd appreciate it if someone could tell me where it's wrong, though - I can't figure it out (only having a minimal understanding of what it's supposed to do doesn't help). — Bob, 13 hours ago

Arman · Answer 10 · 2014-04-27 13:51:10Z

Julia: 12.149 6.929 s

Despite their claims to speed, the initial JIT compilation time holds us back!

Note that the following Julia code is effectively a direct translation of the original Python code (no optimisations made) as a demonstration that you can easily transfer your programming experience to a faster language ;)

require("Iterators")

n = 6
iters = 1000
firstzero = 0
bothzero = 0

for S in Iterators.product(fill([-1,1], n+1)...)
    for i = 1:iters
        F = [[-1 0 0 1][rand(1:4)] for _ = 1:n]
        while all((x) -> round(x,8) == 0, F)
            F = [[-1 0 0 1][rand(1:4)] for _ = 1:n]
        end
        FS = conv(F, [S...])
        if round(FS[1],8) == 0
            firstzero += 1
        end
        if all((x) -> round(x,8) == 0, FS)
            bothzero += 1
        end
    end
end

println("firstzero ", firstzero)
println("bothzero ", bothzero)

Edit

Running with n = 8 takes 32.935 s. Considering that the complexity of this algorithm is O(2^n), then 4 * (12.149 - C) = (32.935 - C), where C is a constant representing the JIT compilation time. Solving for C we find that C = 5.2203, suggesting that actual execution time for n = 6 is 6.929 s.

How about increasing n to 8 to see if Julia comes into its own then? — Lembik, yesterday

asked	yesterday
viewed	3747 times
active	today

current community

your communities

more stack exchange communities

How slow is python really? (Or how fast is your language?)

10 Answers

Python2.7 + Numpy 1.8.1: 10.242 s

Fortran 90+: 0.029 s 0.003 s 0.022 s

C

Rust: 0.011s

Original Python: 8.3

J

MATLAB 0.024s

C# 0.135s

Single-threaded C

Parallel C#:

Test output:

Windows (.NET)

Linux (mono)

Julia: 12.149 6.929 s

Edit

Your Answer

Not the answer you're looking for? Browse other questions tagged fastest-code or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

How slow is python really? (Or how fast is your language?)

10 Answers

Python2.7 + Numpy 1.8.1: 10.242 s

Fortran 90+: 0.029 s 0.003 s 0.022 s

C

Rust: 0.011s

Original Python: 8.3

J

MATLAB 0.024s

C# 0.135s

Single-threaded C

Parallel C#:

Test output:

Windows (.NET)

Linux (mono)

Julia: 12.149 6.929 s

Edit

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged fastest-code or ask your own question.

Related

Hot Network Questions