Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I recently started learning Haskell and as my first project I decided to port a particle simulation I had written in C.

The simulation is pretty simple. We have a cubic box with particles (spheres) inside it. Then we choose to either move a particle or to change the volume of the box and check for any collisions between the particles. If there are any collisions we keep the old configuration.

To speed up collision detection I have also implemented a 'cell linked list' which is in principle a uniform grid with a minimum size of a cell equal to the particle diameter. In the C code I have implemented this using arrays while in the Haskell code I use a Vector of Lists.

You can find my full code here, but here are some of the most important snippets,

The main Simulation Loop, where type Eval a = ReaderT (Env a) (StateT (SimState a) IO)


runSimulation :: Int -> Eval Double ()
runSimulation steps = do
    Env npart ps <-; ask
    forM_ [1..steps] (\i -> do
        forM_ [1..npart+1] (\_ -> do
            SimState config _ variables <-; get
            selection           <-; liftIO $ getRandomR (0, npart)
            if selection < npart
                then do
                    rdx <-; liftIO $ getRandomRs (0.0, _dx variables) >>= \r -> return $ take 3 r
                    rn  <-; liftIO $ getRandomR (0, npart - 1)
                    lift $ moveParticle (vec3fromList rdx) rn
                else do
                    let dv = _dv variables
                    rdv   <-; liftIO $ getRandomR (-dv, dv)
                    racc  <-; liftIO $ getRandomR (0.0, 1.0)
                    let vol = boxVolume $ _box config
                        acc = exp $ (-ps) * rdv + fromIntegral npart * log ((vol + rdv) / vol)
                    when (racc < acc) $ changeVolume rdv
            )
        when (i `mod` 100 == 0) $ do
            state@(SimState _ _ variables) <-; get
            let accVol = fromIntegral (_nVol variables) / 100.0
                accMov = fromIntegral (_nMov variables) / (fromIntegral npart * 100)
                olddx  = _dx variables
                olddv  = _dv variables
                newdx  = (*) olddx (if accMov > 0.3 && olddx < 0.45 then 1.04 else 0.94)
                newdv  = (*) olddv (if accVol > 0.15 then 1.04 else 0.94)
            liftIO $ print i
            liftIO $ printf "dx = %.6f, dv = %.6f, nMov = %d, nVol = %d\n" newdx newdv (_nMov variables) (_nVol variables)
            put $! state{_vars = variables{_dx = newdx, _dv = newdv, _nMov = 0, _nVol = 0}}
            printConfig $ "Data/test" ++ show i ++ ".dat"

Here I'm effectively randomly choosing to either displace a particle or change the volume and every 100 steps I print useful information.

These two moves moveParticle, changeVolume are listed below,


moveParticle :: (RealFrac a, Floating a) => Vec3 a -> Int -> StateIO a ()
moveParticle dx nPart = do
    SimState config@(Configuration box particles) cll variables <-; get
    let particle        = particles V.! nPart
        particle'       = Particle $ applyBC $ _position particle .+. dx
        cllIdx          = cellIndex (_nCells cll) (_position particle')
        isCollision     = any (checkCollision box) [(particle', particles V.! pId) | pId <-; neighbours cll cllIdx, pId /= nPart]
    unless isCollision $ do --Accept move if there is no collision
        let cllIdxOld  = cellIndex (_nCells cll) (_position particle)
            cll'       = if cllIdx == cllIdxOld then cll else cllInsertAt (cllRemoveAt cll cllIdxOld nPart) cllIdx nPart
            particles' = modifyVectorElement nPart particle' particles
        put $! SimState config{_particles = particles'} cll' variables{_nMov = _nMov variables + 1}

changeVolume :: (Floating a, RealFrac a) => a -> Eval a ()
changeVolume dv = do
    SimState config@(Configuration box particles) cll variables <-; get
    Env npart _                                                 <-; ask
    let v                   = boxVolume box
        box'                = let scalef = ((v + dv) / v)**(1.0 / 3.0) in scaleBox scalef box
        isCollision         = any (checkCollision box') combinations
        {-- A list of all the particles pairs that need to be checked for collision.
        For each particle, the particles in the neighbouring cells are used. --}
        combinations        = do
            pId  <-; [0..(npart - 1)]
            let particle = particles V.! pId
                cllIdx   = cellIndex (_nCells cll) (_position particle)
            pId' <-; neighbours cll cllIdx
            guard (pId /= pId')
            return (particle, particles V.! pId')
    unless isCollision $ do --Accept move if there is no collision
        let new_cell_size = zipWith ((/) . fromIntegral) (_nCells cll) box'
            new_n         = map truncate box'
            config'       = config{_box = box'}
            recreate      = any (< 1.0) new_cell_size || any (> 2.0) new_cell_size || any (> 2) (zipWith ((abs .) . (-)) new_n (_nCells cll))
            cll'          = if recreate then generateCellList config' else cll
        put $! SimState config' cll' variables{_nVol = _nVol variables + 1}

I would really appreciate a code review, especially since I just started learning Haskell. The code right now feels a bit messy. As you may have also noticed, I have given the accessor functions an underscore in their name with the intention to use Lenses later on.

I'm especially interested in any performance improvements. My Haskell implementation is up to 10 times slower than the C code which is unacceptable for a simulation. I should also mention that I've tried converting the Cell type to a Data.Seq and the Box type to a Vector but didn't see any performance improvements.

You can find the profiling report here and below I have also posted the space profiling graphs. I don't quite understand from these what is making the Haskell implementation so slow.

enter image description here

enter image description here

enter image description here

UPDATE: I have made most of the changes Peter suggested. The largest boost in performance came indeed from using the ST Monad to modify a Vector and from substituting the MonadRandom with the mersenne-random package.

The code I was using to modify Vectors is the following:


modifyVectorElement ::  Int -> a -> V.Vector a -> V.Vector a
modifyVectorElement !i !x !v = V.modify (\v' -> write v' i x) v 

, which I change to:


modifyVectorElement ::  Int -> a -> Vector a -> Vector a
modifyVectorElement !i !x !v = runST $ unsafeThaw v >>= (\mv -> write mv i x >> unsafeFreeze mv) 

Including all the other changes Peter suggested, brought the time for a 1000 iterations from ~30s to ~3.7s, which is about 25-35% slower than the C equivalent code. This is very satisfactory, but I'm sure more optimizations may be possible.

I'm still surprised by how big a difference optimization makes in Haskell compared to imperative languages, where beyond having a good algorithm, the performance gains from optimizations are much smaller.

For anyone that may be interested, I have updated the code on GitHub with the latest changes.

share|improve this question
1  
Welcome to Code Review. You must include the code you would like reviewed directly in the question as explained by the faq. –  codesparkle May 12 '13 at 18:02
1  
Thank you. I have included some snippets. –  Grieverheart May 12 '13 at 18:34

1 Answer 1

up vote 5 down vote accepted

Heap profiling will not tell you much in this case - memory leaking is definitely not your problem. Using cost-centre profiling would probably be better suited for identification of the hot-spots.

After looking at the profile a bit (using home-grown tools), the worst offenders seem to be:

  1. Lots of calls to stg_newArray, undoubtedly due to the CellList vector getting copied by modify. The documentation says that modify can update destructively, but I am pretty sure that this only refers to situations where the source Vector can be derived to be a temporary value available for fusion.

    To really get destructive updates, you would probably have to explicitly use a MVector and the ST monad. As you already have a monad around, the amount of refactoring involved wouldn't be crushing, but you would obviously lose a bit of purity.

  2. Additionally, the neighbours, getCell and cellNeighbours functions are getting hammered by your program. You are doing some pretty fancy list processing in there that GHC seems to not optimize very well - pretty much all these lists are actually generated and consumed. Helping GHC a bit by spelling out the loops might improve things.

  3. Finally, the default random number generator is slow - any time you use it you get a flurry of Integer allocations behind the scenes. Try to find a better one on Hackage.


Actually, I might have mis-interpreted the profile for point number two a bit - the following is actually a big no-no:

type CellIndex = [Int] -- The index of a cell [i, j, k]

Haskell can do a lot of optimization, but only if it has enough information. As it stands, getCell will get a prefix like:

  case y5 of wild18 {
  [] -> lvl9;
  : i1 ds8 -> case ds8 of wild19 {
  [] -> lvl9;
  : j1 ds9 -> case ds9 of wild20 {
  [] -> lvl9;
  : k1 ds10 -> case ds10 of wild21 {
  [] -> case i1 of wild22 { GHC.Types.I# y6 ->
        case j1 of wild23 { GHC.Types.I# y7 ->
        case k1 of wild24 { GHC.Types.I# y8 -> ... } } };
  : ipv1 ipv2 -> lvl9
  } } } }

Meaning that it traverse the list, checks that it contains exactly three elements, and then unpacks every Int separately. By passing (Int, Int, Int) or your own Vec3 you would give Haskell enough information to derive that it can skip all the heap allocation.


Concerning my original point:

cellNeighbours (CellList !nCells _) cell = do
    i <- [-1..1]
    j <- [-1..1]
    k <- [-1..1]
    let temp = zipWith3 (((+).).(+)) nCells cell [i, j, k]
    return $! zipWith mod temp nCells

Now this is arguably nice code - even though I personally don't like overly tricky (.) constructions or usage of the list monad in the first place (list comprehensions!). The performance problem here is that this becomes something like follows after desugaring:

concatMap (\i -> concatMap (\j -> concatMap (\k -> [...]) [-1..1]) [-1..1]) [-1..1])

GHC will not unroll these, which means that you have constructed a multiple-level loop doing trivial list concatenations in a completely predictable manner.

If you for example wrote it as:

cellNeighbours (CellList (d,e,f) _) (x,y,z) = map add offsets
  where add (a,b,c) = (x+a+d, y+b+e, z+c+f)
        offsets     = [(a,b,c) | a <- [-1..1], b <- [-1..1], c <- [-1..1]]

That would make offsets a global constant that only gets evaluated once and would allow GHC to fuse the map with the concatMap in neighbours, eliminating the intermediate list completely.


Some points on the structure:

  1. You should try to isolate your simulation code as much as possible. Right now it is in "main.hs", which should normally just handle initialization and configuration. Splitting out the actual maths bits would be a pretty straightforward improvement.

  2. On a similar note, you already have a monad, but aren't using it as an encapsulation tool - your core simulation code still uses put directly. Try to find out what "verbs" you are really using, then define some simple wrappers for those (say increaseParticleMoves?). After you have eliminated all direct access, you can then export the monad and all actions to its own module, newtype-wrap it and stop exporting its definition. Having this sort of infrastructure would for example make the switch to ST much easier.

share|improve this answer
    
Hi Peter, thank you very much for taking the time to analyze my code and comment on it. –  Grieverheart May 13 '13 at 17:54
    
On point 2, I'm not sure what you're trying to say, could you be a bit more specific? I also depended on these functions being lazily evaluated, do you know if this is the case? On point 3, I did indeed notice Integer being used and later found out it is due to Random probably calling a time function. Maybe I'm asking too much, but could you perhaps give me any tips beyond performance e.g. style and structure of the code? –  Grieverheart May 13 '13 at 18:01
    
I have added some more comments. And no, the problem with Random isn't time, it's that it internally always first generates a large Integer random number and then cuts it down to what was actually requested. –  Peter Wortmann May 14 '13 at 13:52

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.