Take the 2-minute tour ×
Programming Puzzles & Code Golf Stack Exchange is a question and answer site for programming puzzle enthusiasts and code golfers. It's 100% free, no registration required.

I have this code which I have written in python/numpy

from __future__ import division
import numpy as np
import itertools

n=6
iters = 1000
firstzero = 0
bothzero = 0
""" The next line iterates over arrays of length n+1 which contain only -1s and 1s """
for S in itertools.product([-1,1], repeat = n+1):
    """For i from 0 to iters -1 """
    for i in xrange(iters):
        """ Choose a random array of length n.
            Prob 1/4 of being -1, prob 1/4 of being 1 and prob 1/2 of being 0. """
        F = np.random.choice(np.array([-1,0,0,1], dtype=np.int8), size = n)
        """The next loop just makes sure that F is not all zeros."""
        while np.all(F ==0):
            F = np.random.choice(np.array([-1,0,0,1], dtype=np.int8), size = n)
        """np.convolve(F,S,'valid') computes two inner products between
        F and the two successive windows of S of length n."""
        FS = np.convolve(F,S, 'valid')
        if (FS[0] == 0):
            firstzero += 1
        if np.all(FS==0):
            bothzero += 1

print "firstzero",    firstzero
print "bothzero",  bothzero

It is counting the number of times the convolution of two random arrays, one which is one longer than the other, with a particular probability distribution, has a 0 at the first position or a 0 in both positions.

I had a bet with a friend who says python is a terrible language to write code in that needs to be fast. It takes 9s on my computer. He says it could be made 100 times faster if written in a "proper language".

The challenge is to see if this code can indeed by made 100 times faster in any language of your choice. I will test your code and the fastest one week from now will win. If anyone gets below 0.09s then they automatically win and I lose.

Status A number of amazing answers. I will get them running on my machine as soon as possible to see if we already have a winner.

My Machine The timings will be run on my machine. This is a standard ubuntu install on an AMD FX-8350 Eight-Core Processor. This also means I need to be able to run your code.

share|improve this question
1  
Also, I'm pretty sure that you could blow this out of the water using OpenCL/CUDA. But my GPU programming skills are nonexistent, so I defer that to someone else. –  nneonneo 19 hours ago
1  
Hmm, if I knew what any of this did, I might be able to do it in Fortran 90+. –  Kyle Kanos 18 hours ago
2  
I'm pretty sure you'll loose your bet. –  Michael 18 hours ago
5  
I love python. It's my favourite language, but don't let anyone mislead you into thinking it's fast. In my eyes, choosing Python over a language like C++ or something is choosing developer time over computation time. –  undergroundmonorail 17 hours ago
2  
@DavidZ Heh, true, though what the VM does is "interpreted" in a sense. Terminology issue, though. –  Manishearth 16 hours ago
show 15 more comments

9 Answers

Python2.7 + Numpy 1.8.1: 10.242 s

Fortran 90+: 0.029 s

Damn straight you lost your bet! Not a drop of parallelization here too, just straight Fortran 90+.

program convolve_random_arrays
   implicit none
   integer :: F(6), S(7), FS(6), n, p, iters, i, nil(0:1)

   n = 6
   iters = 1000
   nil = 0
   do p=1,128
      S = set_S_array(i)
      do i=1,iters
         F = 0
         do 
            if(.not.all(F==0)) exit
            F = rand_int_array(n)
         enddo
         FS = convolve(F,S)
         if(FS(1) == 0) nil(0) = nil(0) + 1
         if(ALL(FS == 0)) nil(1) = nil(1) + 1
      enddo
   enddo

   print *,"first zero:",nil(0)
   print *," both zero:",nil(1)

 contains
   function convolve(x, h) result(y)
!x is the signal array
!h is the noise/impulse array
      integer, dimension(:), intent(in) :: x, h
      integer, dimension(size(x)) :: y

      integer :: kernelsize, datasize
      integer :: i,j,k

      datasize = size(x)
      kernelsize = size(h)

      !last part
      do i=kernelsize,datasize
         y(i) = 0
         j=i
         do k=1,kernelsize
            y(i) = y(i) + x(j)*h(k)
            j = j-1
         end do
      end do

      !first part
      do i=1,kernelsize
         y(i) = 0
         j=i
         k=1
         do while (j > 0)
            y(i) = y(i) + x(j)*h(k)
            j = j-1
            k = k+1
         end do
      end do

   end function convolve

   function set_S_array(i) result(S)
      integer, intent(in) :: i
      integer :: S(7)
      select case(i)
      case(1)
         S = [-1, -1, -1, -1, -1, -1, -1]
      case(2)
         S = [-1, -1, -1, -1, -1, -1, 1]

! ... skip 125 lines. I have no idea how itertools.product 
!     works, so I just copied the permutations from your code 
!     the code *could* be faster if I knew a quick formula for
!     the different permutations ...

      case(126)
         S = [1, 1, 1, 1, 1, -1, 1]
      case(127)
         S = [1, 1, 1, 1, 1, 1, -1]
      case(128)
         S = [1, 1, 1, 1, 1, 1, 1]
      end select
   end function set_S_array

   function rand_int_array(i) result(x)
     integer, intent(in) :: i
     integer :: x(i), j
     real :: y
     do j=1,i
        call random_number(y)
        if(y <= 0.25) then
           x(j) = -1
        else if (y > 0.75) then
           x(j) = +1
        else
           x(j) = 0
        endif
     enddo
   end function rand_int_array
end program convolve_random_arrays

I suppose the question now is will you stop using slow-as-molasses Python and use fast-as-electrons-can-move Fortran ;).

share|improve this answer
    
Excellent +1 :) –  Timtech 18 hours ago
1  
Wouldn't the case statement be faster than a generator function anyway? Unless you're expecting some kind of branch-prediction/cache-line/etc speedup? –  OrangeDog 17 hours ago
8  
Speed should be compared on the same machine. What runtime did you get for the OP's code? –  nbubis 16 hours ago
    
Though not specified in the question, I think the point is that generating the S array should not depend on what n is, i.e. your algorithm should work for all n, it's just that in this test case n happens to be 6, so I don't think that hard-coding the arrays is a good idea –  ace 14 hours ago
    
This also doesn't generate the random array correctly. OP's code checks that at least one value in the random array is none-zero, and if it isn't, it generates a new random array. Don't ask me why it does this, but it does. –  Alistair Buxton 11 hours ago
show 5 more comments

Python 2.7 - 0.882s

(OP's original: 6.404s)

import itertools
import operator
import random

n=6
iters = 1000
firstzero = 0
bothzero = 0

for S in itertools.product([-1,1], repeat = n+1):
    for i in xrange(iters):
        F = [random.choice([-1,0,0,1]) for j in xrange(n)]
        while not any(F):
            F = [random.choice([-1,0,0,1]) for j in xrange(n)]
        if not sum(map(operator.mul, F, S[:-1])):
            firstzero += 1
            if not sum(map(operator.mul, F, S[1:])):
                bothzero += 1

print "firstzero", firstzero
print "bothzero", bothzero

OP's original code uses such tiny arrays there is no benefit to using Numpy, as this pure python implementation demonstrates.

I also optimize by skipping the rest of the convolution if the first result isn't zero.

share|improve this answer
    
This approach also allows you to use pypy! –  Lembik 8 hours ago
    
With pypy this runs in about 0.5 seconds. –  Alistair Buxton 1 hour ago
add comment

C++ (VS 2012) - 0.026s 0.015s

Python 2.7.6/Numpy 1.8.1 - 12s

Speedup ~x800.

The gap would be a lot smaller if the convolved arrays were very large...

#include <vector>
#include <iostream>

using namespace std;

int my_random()
{
   static unsigned int seed = 35;
   seed = seed*1664525UL + 1013904223UL; // numerical recipes, 32 bit

   switch((seed>>30) & 3)
   {
   case 0: return 0;
   case 1: return -1;
   case 2: return 1;
   case 3: return 0;
   }
   return 0;
}

bool allzero(const vector<int>& T)
{
   for(auto x : T)
   {
      if(x!=0)
      {
         return false;
      }
   }
   return true;
}

void convolve(vector<int>& out, const vector<int>& v1, const vector<int>& v2)
{
   for(size_t i = 0; i<out.size(); ++i)
   {
      int result = 0;
      for(size_t j = 0; j<v2.size(); ++j)
      {
         result += v1[i+j]*v2[j];
      }
      out[i] = result;
   }
}

void advance(vector<int>& v)
{
   for(auto &x : v)
   {
      if(x==-1)
      {
         x = 1;
         return;
      }
      x = -1;
   }
}

void convolve_random_arrays(void)
{
   const size_t n = 6;
   const int two_to_n_plus_one = 128;
   const int iters = 1000;
   int bothzero = 0;
   int firstzero = 0;

   vector<int> S(n+1);
   vector<int> F(n);
   vector<int> FS(2);

   for(auto &x : S)
   {
      x = -1;
   }
   for(int i=0; i<two_to_n_plus_one; ++i)
   {
      for(int j=0; j<iters; ++j)
      {
         do
         {
            for(auto &x : F)
            {
               x = my_random();
            }
         } while(allzero(F));
         convolve(FS, S, F);
         if(FS[0] == 0)
         {
            firstzero++;
            if(FS[1] == 0)
            {
               bothzero++;
            }
         }
      }
      advance(S);
   }
}

A few notes:

  • The random function is being called in the loop so I went for a very light weight linear congruential generator (but generously looked at the MSBs).
  • This is really just the starting point for an optimized solution.
  • Didn't take that long to write...
  • I iterate through all the values of S taking S[0] to be the "least significant" digit.
share|improve this answer
    
Indeed. The tiny size of the arrays in OP's code means using numpy is actually an order of magnitude slower than straight python. –  Alistair Buxton 9 hours ago
    
Now x800 is what I am talking about! –  Lembik 8 hours ago
add comment

Rust: 0.011s

Original Python: 8.3

extern crate rand;

use rand::Rng;

static N: uint = 6;
static ITERS: uint = 1000;

fn convolve<T: Num>(into: &mut [T], a: &[T], b: &[T]) {
    // we want `a` to be the longest array
    if a.len() < b.len() {
        convolve(into, b, a);
        return
    }

    assert_eq!(into.len(), a.len() - b.len() + 1);

    for (n,place) in into.mut_iter().enumerate() {
        for (x, y) in a.iter().zip(b.slice_from(n).iter()) {
            *place = *place + *x * *y
        }
    }
}

fn main() {
    let mut first_zero = 0;
    let mut both_zero = 0;
    let mut rng = rand::XorShiftRng::new().unwrap();

    for s in PlusMinus::new() {
        for _ in range(0, ITERS) {
            let mut f = [0, .. N];
            while f.iter().all(|x| *x == 0) {
                for p in f.mut_iter() {
                    match rng.gen::<u32>() % 4 {
                        0 => *p = -1,
                        1 | 2 => *p = 0,
                        _ => *p = 1
                    }
                }
            }

            let mut fs = [0, .. 2];
            convolve(fs, s, f);

            if fs[0] == 0 { first_zero += 1 }
            if fs.iter().all(|&x| x == 0) { both_zero += 1 }
        }
    }

    println!("{}\n{}", first_zero, both_zero);
}



/// An iterator over [+-]1 arrays of the appropriate length
struct PlusMinus {
    done: bool,
    current: [i32, .. N + 1]
}
impl PlusMinus {
    fn new() -> PlusMinus {
        PlusMinus { done: false, current: [-1, .. N + 1] }
    }
}

impl Iterator<[i32, .. N + 1]> for PlusMinus {
    fn next(&mut self) -> Option<[i32, .. N+1]> {
        if self.done {
            return None
        }

        let ret = self.current;

        // a binary "adder", that just adds one to a bit vector (where
        // -1 is the zero, and 1 is the one).
        for (i, place) in self.current.mut_iter().enumerate() {
            *place = -*place;
            if *place == 1 {
                break
            } else if i == N {
                // we've wrapped, so we want to stop after this one
                self.done = true
            }
        }

        Some(ret)
    }
}
  • Compiled with --opt-level=3
  • My rust compiler is a recent nightly: (rustc 0.11-pre-nightly (eea4909 2014-04-24 23:41:15 -0700) to be precise)
share|improve this answer
add comment

J

I don't expect to beat out any compiled languages, and something tells me it'd take a miraculous machine to get less than 0.09 s with this, but I'd like to submit this J anyway, because it's pretty slick.

NB. constants
num =: 6
iters =: 1000

NB. convolve
NB. take the multiplication table                */
NB. then sum along the NE-SW diagonals           +//.
NB. and keep the longest ones                    #~ [: (= >./) #/.
NB. operate on rows of higher dimensional lists  " 1
conv =: (+//. #~ [: (= >./) #/.) @: (*/) " 1

NB. main program
S  =: > , { (num+1) # < _1 1                NB. all {-1,1}^(num+1)
F  =: (3&= - 0&=) (iters , num) ?@$ 4       NB. iters random arrays of length num
FS =: ,/ S conv/ F                          NB. make a convolution table
FB =: +/ ({. , *./)"1 ] 0 = FS              NB. first and both zero
('first zero ',:'both zero ') ,. ":"0 FB    NB. output results

This takes about 0.5 s on a laptop from the previous decade, only about 20x as fast as the Python in the answer. Most of the time is spent in conv because we write it lazily (we compute the entire convolution) and in full generality.

Since we know things about S and F, we can speed things up by making specific optimizations for this program. The best I've been able to come up with is conv =: ((num, num+1) { +//.)@:(*/)"1—select specifically the two numbers that correspond from the diagonal sums to the longest elements of the convolution—which approximately halves the time.

share|improve this answer
add comment

Perl - 89% improvement

On my ancient netbook, the OP's code takes 53 seconds to run; Alistair Buxton's version takes about 6.5 seconds, and the following Perl version takes about 5.7 seconds.

use v5.10;
use strict;
use warnings;

use Algorithm::Combinatorics qw( variations_with_repetition );
use List::Util qw( any sum );
use List::MoreUtils qw( pairwise );

my $n         = 6;
my $iters     = 1000;
my $firstzero = 0;
my $bothzero  = 0;

my $variations = variations_with_repetition([-1, 1], $n+1);
while (my $S = $variations->next)
{
  for my $i (1 .. $iters)
  {
    my @F;
    until (@F and any { $_ } @F)
    {
      @F = map +((-1,0,0,1)[rand 4]), 1..$n;
    }

    # The pairwise function doesn't accept array slices,
    # so need to copy into a temp array @S0
    my @S0 = @$S[0..$n-1];

    unless (sum pairwise { $a * $b } @F, @S0)
    {
      $firstzero++;
      my @S1 = @$S[1..$n];  # copy again :-(
      $bothzero++ unless sum pairwise { $a * $b } @F, @S1;
    }
  }
}

say "firstzero ", $firstzero;
say "bothzero ", $bothzero;
share|improve this answer
add comment

C

Takes ~ 0.032s (or ~ 0.030s with -O flag of gcc) on my machine, with OP's original code taking ~ 7.7s. Tried to optimize by generating the random array and convolving in the same loop, but it doesn't seem to make a lot of difference.

The first array is generated by taking an integer, write it out in binary, and change all 1 to -1 and all 0 to 1. The rest should be very straightforward.

Edit: instead of having n as an int, now we have n as a macro-defined constant, so we can use int arr[n]; instead of malloc.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define n (6)
#define iters (1000)
int main() {
    int firstzero=0, bothzero=0;
    int arr[n+1];
    unsigned int i, j;
    srand(time(NULL));

    for(i=0; i< 1<<(n+1) ; i++) {
        unsigned int tmp=i;
        for(j=0; j<n+1; j++) {
            arr[j]=(tmp&1)*(-2)+1;
            tmp>>=1;
        }
        for(j=0; j<iters; j++) {
            int randArr[n];
            unsigned int k, flag=0;
            int first=0, second=0;
            for(k=0; k<n; k++) {
                randArr[k]=rand()%4;
                if(randArr[k]==2) randArr[k]=0;
                if(randArr[k]==3) randArr[k]=-1;
                if(randArr[k]) flag++;
                if(k==n-1&&!flag) {
                    k=-1;
                    continue;
                }
                else {
                    first+=arr[k]*randArr[k];
                    second+=arr[k+1]*randArr[k];
                }
            }
            firstzero+=(!first);
            bothzero+=(!first&&!second);
        }
    }
    printf("firstzero %d\nbothzero %d\n", firstzero, bothzero);
    return 0;
}
share|improve this answer
add comment

Julia: 12.149 6.929 s

Despite their claims to speed, the initial JIT compilation time holds us back!

Note that the following Julia code is effectively a direct translation of the original Python code (no optimisations made) as a demonstration that you can easily transfer your programming experience to a faster language ;)

require("Iterators")

n = 6
iters = 1000
firstzero = 0
bothzero = 0

for S in Iterators.product(fill([-1,1], n+1)...)
    for i = 1:iters
        F = [[-1 0 0 1][rand(1:4)] for _ = 1:n]
        while all((x) -> round(x,8) == 0, F)
            F = [[-1 0 0 1][rand(1:4)] for _ = 1:n]
        end
        FS = conv(F, [S...])
        if round(FS[1],8) == 0
            firstzero += 1
        end
        if all((x) -> round(x,8) == 0, FS)
            bothzero += 1
        end
    end
end

println("firstzero ", firstzero)
println("bothzero ", bothzero)

Edit

Running with n = 8 takes 32.935 s. Considering that the complexity of this algorithm is O(2^n), then 4 * (12.149 - C) = (32.935 - C), where C is a constant representing the JIT compilation time. Solving for C we find that C = 5.2203, suggesting that actual execution time for n = 6 is 6.929 s.

share|improve this answer
    
How about increasing n to 8 to see if Julia comes into its own then? –  Lembik 7 hours ago
add comment

C# 0.135s

C# based on Alistair Buxton's plain python: 0.278s
Parallelised C#: 0.135s
Python from the question: 5.907s
Alistair's plain python: 0.853s

I'm not actually certain this implementation is correct - its output is different, if you look at the results down at the bottom.

There's certainly more optimal algorithms. I just decided to use a very similar algorithm to the Python one.

Single-threaded C

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConvolvingArrays
{
    static class Program
    {
        static void Main(string[] args)
        {
            int n=6;
            int iters = 1000;
            int firstzero = 0;
            int bothzero = 0;

            int[] arraySeed = new int[] {-1, 1};
            int[] randomSource = new int[] {-1, 0, 0, 1};
            Random rand = new Random();

            foreach (var S in Enumerable.Repeat(arraySeed, n+1).CartesianProduct())
            {
                for (int i = 0; i < iters; i++)
                {
                    var F = Enumerable.Range(0, n).Select(_ => randomSource[rand.Next(randomSource.Length)]);
                    while (!F.Any(f => f != 0))
                    {
                        F = Enumerable.Range(0, n).Select(_ => randomSource[rand.Next(randomSource.Length)]);
                    }
                    if (Enumerable.Zip(F, S.Take(n), (f, s) => f * s).Sum() == 0)
                    {
                        firstzero++;
                        if (Enumerable.Zip(F, S.Skip(1), (f, s) => f * s).Sum() == 0)
                        {
                            bothzero++;
                        }
                    }
                }
            }

            Console.WriteLine("firstzero {0}", firstzero);
            Console.WriteLine("bothzero {0}", bothzero);
        }

        // itertools.product?
        // http://ericlippert.com/2010/06/28/computing-a-cartesian-product-with-linq/
        static IEnumerable<IEnumerable<T>> CartesianProduct<T>
            (this IEnumerable<IEnumerable<T>> sequences)
        {
            IEnumerable<IEnumerable<T>> emptyProduct =
              new[] { Enumerable.Empty<T>() };
            return sequences.Aggregate(
              emptyProduct,
              (accumulator, sequence) =>
                from accseq in accumulator
                from item in sequence
                select accseq.Concat(new[] { item }));
        }
    }
}

Parallel C#:

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace ConvolvingArrays
{
    static class Program
    {
        static void Main(string[] args)
        {
            int n=6;
            int iters = 1000;
            int firstzero = 0;
            int bothzero = 0;

            int[] arraySeed = new int[] {-1, 1};
            int[] randomSource = new int[] {-1, 0, 0, 1};

            ConcurrentBag<int[]> results = new ConcurrentBag<int[]>();

            // The next line iterates over arrays of length n+1 which contain only -1s and 1s
            Parallel.ForEach(Enumerable.Repeat(arraySeed, n + 1).CartesianProduct(), (S) =>
            {
                int fz = 0;
                int bz = 0;
                ThreadSafeRandom rand = new ThreadSafeRandom();
                for (int i = 0; i < iters; i++)
                {
                    var F = Enumerable.Range(0, n).Select(_ => randomSource[rand.Next(randomSource.Length)]);
                    while (!F.Any(f => f != 0))
                    {
                        F = Enumerable.Range(0, n).Select(_ => randomSource[rand.Next(randomSource.Length)]);
                    }
                    if (Enumerable.Zip(F, S.Take(n), (f, s) => f * s).Sum() == 0)
                    {
                        fz++;
                        if (Enumerable.Zip(F, S.Skip(1), (f, s) => f * s).Sum() == 0)
                        {
                            bz++;
                        }
                    }
                }

                results.Add(new int[] { fz, bz });
            });

            foreach (int[] res in results)
            {
                firstzero += res[0];
                bothzero += res[1];
            }

            Console.WriteLine("firstzero {0}", firstzero);
            Console.WriteLine("bothzero {0}", bothzero);
        }

        // itertools.product?
        // http://ericlippert.com/2010/06/28/computing-a-cartesian-product-with-linq/
        static IEnumerable<IEnumerable<T>> CartesianProduct<T>
            (this IEnumerable<IEnumerable<T>> sequences)
        {
            IEnumerable<IEnumerable<T>> emptyProduct =
              new[] { Enumerable.Empty<T>() };
            return sequences.Aggregate(
              emptyProduct,
              (accumulator, sequence) =>
                from accseq in accumulator
                from item in sequence
                select accseq.Concat(new[] { item }));
        }
    }

    // http://stackoverflow.com/a/11109361/1030702
    public class ThreadSafeRandom
    {
        private static readonly Random _global = new Random();
        [ThreadStatic]
        private static Random _local;

        public ThreadSafeRandom()
        {
            if (_local == null)
            {
                int seed;
                lock (_global)
                {
                    seed = _global.Next();
                }
                _local = new Random(seed);
            }
        }
        public int Next()
        {
            return _local.Next();
        }
        public int Next(int maxValue)
        {
            return _local.Next(maxValue);
        }
    }
}

Test output:

Windows (.NET)

The C# is much faster on Windows. Probably because .NET is faster than mono.

User and sys timing doesn't seem to work (used git bash for timing).

$ time /c/Python27/python.exe numpypython.py
firstzero 27413
bothzero 12073

real    0m5.907s
user    0m0.000s
sys     0m0.000s
$ time /c/Python27/python.exe plainpython.py
firstzero 26983
bothzero 12033

real    0m0.853s
user    0m0.000s
sys     0m0.000s
$ time ConvolvingArrays.exe
firstzero 28526
bothzero 6453

real    0m0.278s
user    0m0.000s
sys     0m0.000s
$ time ConvolvingArraysParallel.exe
firstzero 28857
bothzero 6485

real    0m0.135s
user    0m0.000s
sys     0m0.000s

Linux (mono)

bob@phoebe:~/convolvingarrays$ time python program.py
firstzero 27059
bothzero 12131

real    0m11.932s
user    0m11.912s
sys     0m0.012s
bob@phoebe:~/convolvingarrays$ mcs -optimize+ -debug- program.cs
bob@phoebe:~/convolvingarrays$ time mono program.exe
firstzero 28982
bothzero 6512

real    0m1.360s
user    0m1.532s
sys     0m0.872s
bob@phoebe:~/convolvingarrays$ mcs -optimize+ -debug- parallelprogram.cs
bob@phoebe:~/convolvingarrays$ time mono parallelprogram.exe
firstzero 28857
bothzero 6496

real    0m0.851s
user    0m2.708s
sys     0m3.028s
share|improve this answer
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.