I'm a newbie with Haskell and I'm trying to learn the language and its idioms while working through a small data analysis project. The first step of my analysis is to take a very long list of data, generate the set of ngrams from that list, then create a histogram from the generated set of ngrams.
This approach seems to work nicely:
import Data.List
import Control.Arrow
histogram :: Ord a => [a] -> [(a,Int)]
histogram = map (head &&& length) . group . sort
ngrams :: Eq a => Int -> [a] -> [[a]]
ngrams n xs
| nx == xs = [xs]
| otherwise = [nx] ++ (ngrams n (tail xs))
where nx = take n xs
The next step is divide each ngram frequency by the total number of ngrams and then take the inverse of each of these values.
(//) a b = fromIntegral a / fromIntegral b
mapOverSnd :: (a -> b) -> [(c,a)] -> [(c,b)]
mapOverSnd f = map (fst &&& f . snd)
...
let myData = (long list of values)
let my3grams = ngrams 3 myData
let freq3grams = histogram my3grams
let novelty3grams = mapOverSnd ((1/) . (//(length freq3grams))) freq3grams
My first question is.. is this a reasonable way to be doing this? Is there a more idiomatic way to accomplish mapOverSnd? I don't yet have much dexterity wielding higher order structures to accomplish my tasks.
Also, I'd like to sort the resultant structure: first by the value stored in snd, then by the value stored in fst. I'd like the primary sort to be descending and the secondary sort to be ascending. This works:
my2TupleOrdering :: (Ord a, Ord b) => (a,b) -> (a,b) -> Ordering
my2TupleOrdering (a1,b1) (a2,b2)
| b1 < b2 = GT
| b1 > b2 = LT
| a1 < a2 = LT
| a1 > a2 = GT
| otherwise = EQ
however, I was wondering if there is a more idiomatic way to say "sort this 2-tuple, first by snd descending, then by fst ascending."
I realize these are relatively minor questions, but I'm interested in exploring the expressive power of Haskell and figured 'code review' might be the correct forum. Thanks!