Segmentation using maximum likelihood algorithm on images using python

Question

I would like to perform image segmentation using maximum likelihood algorithm implemented in python. The mean vectors of the classes, and covariance matrices are known, and iterating over the images (which are quite big...5100X7020) we can calculate for each pixel the probability of being part of the given class.

Simply written in Python

import numpy as np
from numpy.linalg import inv
from numpy.linalg import det
...

probImage1 = []
probImage1Vector = []

norm = 1.0 / (np.power((2*np.pi), i/2) * np.sqrt(np.linalg.det(covMatrixClass1)))
covMatrixInverz = np.linalg.inv(covMatrixClass1)
for x in xrange(x_img):
    for y in xrange(y_img):
        X = realImage[x,y]
        pixelValueDifference = X - meanVectorClass1
        mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
        mult2 = np.dot(covMatrixInverz,pixelValueDifference)
        multMult = np.dot(mult1,mult2)
        expo = np.exp(multMult)     
        probImage1Vector.append(np.multiply(norm,expo))
    probImage1.append(probImage1Vector)
    probImage1Vector = []

The problem that this code is very slow when performing on large images... The calculations like vector subtraction and multiplication consumes a lot of time, even though they are only 1X3 vectors.

Could you please give a hint how to speed up this code? I would really appreciate. Sorry, if I was not clear I am still beginner in python.

Community · Accepted Answer · 2020-06-20 09:12:55Z

3

Taking a closer look at :

mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
mult2 = np.dot(covMatrixInverz,pixelValueDifference)
multMult = np.dot(mult1,mult2)

We see that the operation is basically :

A.T (d) C (d) A         # where `(d)` is the dot-product

Those three steps could be easily expressed as one string notation in np.einsum, like so -

np.einsum('k,lk,l->',pA,covMatrixInverz,-0.5*pA)

Performing this across both iterators i(=x) and j(=y), we would have a fully vectorized expression -

np.einsum('ijk,lk,ijl->ij',pA,covMatrixInverz,-0.5*pA))

Alternatively, we could perform the first part of sume-reduction with np.tensordot -

mult2_vectorized = np.tensordot(pA, covMatrixInverz, axes=([2],[1]))
output = np.einsum('ijk,ijk->ij',-0.5*pA, mult2_vectorized)

Benchmarking

Listing all approaches as functions -

# Original code posted by OP to return array
def org_app(meanVectorClass1, realImage, covMatrixInverz, norm):
    probImage1 = []
    probImage1Vector = []
    x_img, y_img = realImage.shape[:2]
    for x in xrange(x_img):
        for y in xrange(y_img):
            X = realImage[x,y]
            pixelValueDifference = X - meanVectorClass1
            mult1 = np.multiply(-0.5,np.transpose(pixelValueDifference))
            mult2 = np.dot(covMatrixInverz,pixelValueDifference)
            multMult = np.dot(mult1,mult2)
            expo = np.exp(multMult)     
            probImage1Vector.append(np.multiply(norm,expo))
            probImage1.append(probImage1Vector)
            probImage1Vector = []
    return np.asarray(probImage1).reshape(x_img,y_img)
    
def vectorized(meanVectorClass1, realImage, covMatrixInverz, norm):
    pA = realImage - meanVectorClass1
    mult2_vectorized = np.tensordot(pA, covMatrixInverz, axes=([2],[1]))
    return np.exp(np.einsum('ijk,ijk->ij',-0.5*pA, mult2_vectorized))*norm
    
def vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm):
    pA = realImage - meanVectorClass1
    return np.exp(np.einsum('ijk,lk,ijl->ij',pA,covMatrixInverz,-0.5*pA))*norm

Timings -

In [19]: # Setup inputs
    ...: meanVectorClass1 = np.array([23.96000000, 58.159999, 61.5399])
    ...: 
    ...: covMatrixClass1 = np.array([[ 514.20040404,  461.68323232,  364.35515152],
    ...:        [ 461.68323232,  519.63070707,  446.48848485],
    ...:        [ 364.35515152,  446.48848485,  476.37212121]])
    ...: covMatrixInverz = np.linalg.inv(covMatrixClass1)
    ...: 
    ...: norm = 0.234 # Random float number
    ...: realImage = np.random.rand(1000,2000,3)
    ...: 

In [20]: out1 = org_app(meanVectorClass1, realImage, covMatrixInverz, norm )
    ...: out2 = vectorized(meanVectorClass1, realImage, covMatrixInverz, norm )
    ...: out3 = vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm )
    ...: print np.allclose(out1, out2)
    ...: print np.allclose(out1, out3)
    ...: 
True
True

In [21]: %timeit org_app(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 27.8 s per loop

In [22]: %timeit vectorized(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 182 ms per loop

In [23]: %timeit vectorized2(meanVectorClass1, realImage, covMatrixInverz, norm )
1 loops, best of 3: 275 ms per loop

Looks like the fully vectorized einsum + tensordot hybrid solution is doing pretty good!

For further performance boost, one can also look into numexpr module to speedup the exponential computations on large arrays.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jan 15, 2017 at 18:32

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Kristan Over a year ago

Thank you, really appreciated! I am going to try it ou!

Divakar Over a year ago

@user2955708 Any luck yet?

Kristan Over a year ago

please see my comment below!

Divakar Over a year ago

@user2955708 What are the shapes of meanVectorClass1, realImage and covMatrixInverz in your actual use-case?

Kristan Over a year ago

please find the shapes! ('meanVectorClass1', (3L,)) ('covMatrixClass1', (3L, 3L)) ('realImage', (7020L, 5100L, 3L))

|

Tyler Durden · Accepted Answer · 2017-01-15 18:32:54Z

As a first step, I would get rid of unnecessary function calls like transpose, dot, and multiply. These are all simple calculations which you should be doing inline. When you can actually see what you are doing, instead of hiding things inside of functions, it will be easier to understand the performance problems.

The fundamental issue here is that this appears to be at least a quartic complexity operation. You might want to simply multiply out how many operations you are doing in all of your loops. Is it 500 million, 2 billion, 350 billion? How many?

To get control of performance you need to understand how many instructions you are doing. A modern computer can do about 1 billion instructions per second, but if memory movements are involved, it can be substantially slower.

Collectives™ on Stack Overflow

Segmentation using maximum likelihood algorithm on images using python

2 Answers 2

Benchmarking

9 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Benchmarking

9 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related