Tell me more ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Here's my first significant Haskell program - it presents a count of each word seen on stdin. Any comments are welcome.

import Data.Map

emptyMap = empty :: Map String Integer

countWord m s = alter (\x -> case x of
                               Nothing -> Just 1
                               Just n -> Just (n+1)) s m

mapPairs m = concat [ k ++ ": " ++ (show v) ++ "\n" | (k,v) <- toList m ]

mapTotal = sum . elems

mapOutput m = (mapPairs m) ++ "Total words: " ++ (show (mapTotal m)) ++ "\n"

wc = interact $ mapOutput . (foldl countWord emptyMap) . words

Some of the questions I have are:

  • Am I using Map efficiently here? (I realize there are other solutions which do not use a Map - the main point of this exercise was to learn how to use Data.Map.)

  • How can I make sure that emptyMap, countWord and mapTotal all use the same integral type (Int vs. Integer)? For instance, the type of mapTotal is Map a Integer -> Integer, and so I'll get a type error if I change the definition of emptyMap to empty :: Map String Int. Ideally I'd like to make the choice between Int and Integer in one place for all three definitions.

share|improve this question
2  
You might like fromListWith; you can replace foldl countWord emptyMap with something like fromListWith (+) . flip zip (repeat 1) and skip defining countWord at all. – Daniel Wagner Feb 12 '12 at 4:26

migrated from stackoverflow.com Feb 12 '12 at 7:46

3 Answers

Your usage of Data.Map is sound. I see nothing that could be improved as long as you stay with Data.Map.

If you give all of the functions explicit type declarations, so that the Monomorphism Restriction doesn't apply, you can use generic integral types:

mapTotal :: Integral i => Map a i -> i

Some quick style-related things you could change:

First:

(\x -> case x of
  Nothing -> Just 1
  Just n -> Just (n+1))

... is the same as

Just . maybe 1 (+1)

Second:

You don't need parens around many of your function calls; if you have something like a +.- (foo bar baz) ¤:* b, it means the same thing as a +.- foo bar baz ¤:* b because every operator has lower precedence than function application.

Third:

You use very long indentations. A tip is to insert newlines after = and before control structures like case to make the code more compact. I find this style to be quite good.

share|improve this answer
I should probably have waited until this question was migrated to CodeReview; thought it already had been... – dflemstr Feb 12 '12 at 4:05
The first snippet is actually maybe (Just 1) (Just . (+1)), not maybe 1 (+1). – ehird Feb 12 '12 at 4:41
+1 for the styleguide – Landei Feb 12 '12 at 17:18

Here's what I would do:

import Data.Map

type WordMap = Map String Integer

showWords :: WordMap -> String
showWords m = unlines [ k ++ ": " ++ show v | (k, v) <- toList m ]

showTotal :: WordMap -> String
showTotal = show . sum . elems

showOutput :: WordMap -> String
showOutput m = showWords m ++ "Total words: " ++ showTotal m ++ "\n"

countWords :: String -> WordMap
countWords = fromListWith (+) . flip zip (repeat 1) . words

main :: IO ()
main = interact $ showOutput . countWords
share|improve this answer

It's good code, except for the strictness problem : you're accumulating big thunks of ((1+1)+1)+1.. in your Map. If you're handling long texts with repeated words, you could even end up having a stack overflow when you're finally computing them (in your "mapPair").

In your solution, you'll have to do :

Just n -> Just $! (n+1)

so that the result is evaluated before Just is applied.

Other solutions :

countWord w m = alter ((Just $!) . maybe 1 (+1)) w m

or

fromListWith' :: (v->v->v) -> [(k,v)] -> Map k v
fromListWith' f = foldl' ins empty 
   where ins m (k,v) = insertWith' f k v m

countWords :: String -> Map String Integer
countWords = fromListWith' (+) . flip zip (repeat 1) . words

Any of these solutions will work like your original one but should be a bit faster and less memory hungry, though it is possible that GHC was already optimising that (if you were using -O or -O2).

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.