As part of some code I've written, I needed to perform a 1-D convolution. This convolution ends up happening quite often, so I wanted to optimise it (in some way other than using a FFT, though I could do this if necessary). I expect that my convolution kernels will be ~100, and the vector they are convolved with ~5k.
module Convolve (convolve) where
import qualified Data.Vector as V
convolve :: Num a => V.Vector a -> V.Vector a -> V.Vector a
convolve xs hs = V.map (V.sum . (V.zipWith (*) (V.reverse hs))) slices
where
slices = mkSlices (V.length hs) xs
mkSlices :: Num a => Int -> V.Vector a -> V.Vector (V.Vector a)
mkSlices size vec =
V.map (paddedSlice size vec)
(V.iterateN vecLength (+1) (-(size-1)))
where vecLength = ((V.length vec)) - (-1 * (size -1))
paddedSlice size vec start
| size == 0 = V.empty
| start < 0 && (size+start>0) =
(V.replicate (abs start) 0) V.++ (paddedSlice (size+start) vec 0)
| start < 0 && (size+start<=0) = V.replicate size 0
| start > vL = V.replicate size 0
| (start+size) > vL =
(V.slice start (vL-start) vec)
V.++ (V.replicate (start+size-vL) 0)
| otherwise = V.slice start size vec
where vL = V.length vec
I don't (yet) have any experience optimising Haskell, and so I don't really know where to start. I asked in #haskell, which led to a few changes, but these did not have a drastic effect.
I was thinking to try Unboxed Vectors, but I don't think that U.Vector
(U.Vector a
) is possible. (If I'm wrong, it would be good to know!)
I was also thinking that parallelising might be a good idea. In general, is it better to parallelise "inwards" (e.g. call the outermost map as parallel) or "outwards"?