Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I'm try to implement SSIM method. This method has already been implemented in Python (source code), but my goal is implement it with using only Python and NumPy. Also my goal is to using this method on big images (1024x1024 and above). But filter2 works very slow (approx. 62 sec. for 1024x1024). cProfile gives me information that _methods.py:16(_sum), fromnumeric.py:1422(sum), and method 'reduce' of 'numpy.ufunc' objects eat main part time of run. Any help and advises welcome.

import numpy

def filter2(window, img):
    range1 = img.shape[0] - window.shape[0] + 1
    range2 = img.shape[1] - window.shape[1] + 1
    res = numpy.zeros((range1, range2), dtype=numpy.double)
    for i in range(range1):
        for j in range(range2):
            res[i,j] = numpy.sum(numpy.multiply(img[i:11+i, j:11+j], window))
    return res

def ssim(img1, img2):
    window = numpy.array([\
        [0.0000,    0.0000, 0.0000, 0.0001, 0.0002, 0.0003, 0.0002, 0.0001, 0.0000, 0.0000, 0.0000],\
        [0.0000,    0.0001, 0.0003, 0.0008, 0.0016, 0.0020, 0.0016, 0.0008, 0.0003, 0.0001, 0.0000],\
        [0.0000,    0.0003, 0.0013, 0.0039, 0.0077, 0.0096, 0.0077, 0.0039, 0.0013, 0.0003, 0.0000],\
        [0.0001,    0.0008, 0.0039, 0.0120, 0.0233, 0.0291, 0.0233, 0.0120, 0.0039, 0.0008, 0.0001],\
        [0.0002,    0.0016, 0.0077, 0.0233, 0.0454, 0.0567, 0.0454, 0.0233, 0.0077, 0.0016, 0.0002],\
        [0.0003,    0.0020, 0.0096, 0.0291, 0.0567, 0.0708, 0.0567, 0.0291, 0.0096, 0.0020, 0.0003],\
        [0.0002,    0.0016, 0.0077, 0.0233, 0.0454, 0.0567, 0.0454, 0.0233, 0.0077, 0.0016, 0.0002],\
        [0.0001,    0.0008, 0.0039, 0.0120, 0.0233, 0.0291, 0.0233, 0.0120, 0.0039, 0.0008, 0.0001],\
        [0.0000,    0.0003, 0.0013, 0.0039, 0.0077, 0.0096, 0.0077, 0.0039, 0.0013, 0.0003, 0.0000],\
        [0.0000,    0.0001, 0.0003, 0.0008, 0.0016, 0.0020, 0.0016, 0.0008, 0.0003, 0.0001, 0.0000],\
        [0.0000,    0.0000, 0.0000, 0.0001, 0.0002, 0.0003, 0.0002, 0.0001, 0.0000, 0.0000, 0.0000]\
    ], dtype=numpy.double)

    K = [0.01, 0.03]
    L = 65535

    C1 = (K[0] * L) ** 2
    C2 = (K[1] * L) ** 2

    mu1 = filter2(window, img1)
    mu2 = filter2(window, img2)

    mu1_sq = numpy.multiply(mu1, mu1)
    mu2_sq = numpy.multiply(mu2, mu2)
    mu1_mu2 = numpy.multiply(mu1, mu2)

    sigma1_sq = filter2(window, numpy.multiply(img1, img1)) - mu1_sq
    sigma2_sq = filter2(window, numpy.multiply(img2, img2)) - mu2_sq
    sigma12 = filter2(window, numpy.multiply(img1, img2)) - mu1_mu2

    ssim_map = numpy.divide(numpy.multiply((2*mu1_mu2 + C1), (2*sigma12 + C2)), numpy.multiply((mu1_sq + mu2_sq + C1),(sigma1_sq + sigma2_sq + C2)))
    return numpy.mean(ssim_map)

def calc_ssim():

    img1 = numpy.array(numpy.zeros((1024,1024)),dtype=numpy.double)
    img2 = numpy.array(numpy.zeros((1024,1024)),dtype=numpy.double)

    return ssim(img1, img2)

Update: After rewrite filter2 function with using as_strided, execute time reduced to ~42.7 sec. But anyway this is very slow in compare with matlab realisation.

def filter2(window, x):
range1 = x.shape[0] - window.shape[0] + 1
range2 = x.shape[1] - window.shape[1] + 1
res = numpy.zeros((range1, range2), dtype=numpy.double)
x1 = as_strided(x,((x.shape[0] - 10)/1 ,(x.shape[1] - 10)/1 ,11,11), (x.strides[0]*1,x.strides[1]*1,x.strides[0],x.strides[1])) * window
for i in xrange(range1):
    for j in xrange(range2):
         res[i,j] = x1[i,j].sum()
return res
share|improve this question
add comment

1 Answer

up vote 1 down vote accepted

In your strided filter2, x1 is (1014, 1014, 11, 11). You are iterating over the 1st 2 dimensions in order to sum on on the last 2. Let sum do all the work for you, res = x1.sum((2,3))

def filter2(window, x):
    range1 = x.shape[0] - window.shape[0] + 1
    range2 = x.shape[1] - window.shape[1] + 1
    x1 = as_strided(x,((x.shape[0] - 10)/1 ,(x.shape[1] - 10)/1 ,11,11), (x.strides[0]*1,x.strides[1]*1,x.strides[0],x.strides[1])) * window
    res = x1.sum((2,3))
    return res

In my tests this gives a 6x speed improvement.

With numpy iteration, especially nested ones over large dimensions like 1014 is a speed killer. You want to vectorize this kind of thing as much as possible.

Traditionally Matlab had the same speed problems, but newer versions recognize and compile loops like yours. That's why your numpy is so much slower.

share|improve this answer
 
thank you, for explanation! On my machine execute time reduced to ~4.3 s. –  Renat Zaripov Sep 30 '13 at 5:34
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.