Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I wrote a tokenizer/lexer (difference?) for Python in Haskell: this is my code on GitHub.

I already tested it out with some of CPython's standard library scripts, and it doesn't fail. However, it probably outputs non-conforming output. But I don't care about this right now.

What I'm looking for is a general (functional) style review. This is my first time using Parsec and Haskell for something relatively big, so I expect my functional style to be suboptimal.

Thanks.

Significant code snippets:

parseTokens :: LexemeParser [PositionedToken]
parseTokens = do
  P.skipMany ignorable
  tokens <- concat <$> P.many parseLogicalLine
  P.eof
  endIndents <- length . filter (>0) <$> getIndentLevels
  dedents <- mapM position $ replicate endIndents Dedent
  endMarker <- position EndMarker
  return (tokens ++ dedents ++ [endMarker])


parseIndentation :: LexemeParser [PositionedToken]
parseIndentation = mapM position =<< go
  where
    go = do
      curr <- length <$> P.many (P.char ' ')
      prev <- getIndentLevel
      case curr `compare` prev of
        EQ -> return []
        GT -> return [Indent] <* pushIndentLevel curr
        LT -> do
          levels <- getIndentLevels
          when (not $ elem curr levels) (fail $ "indentation must be: " ++ show levels)
          let (pop, levels') = span (> curr) levels
          putIndentLevels levels'
          return $ replicate (length pop) Dedent


parseLogicalLine :: LexemeParser [PositionedToken]
parseLogicalLine = do
  P.skipMany ignorable
  indentation <- parseIndentation
  (indentation ++) <$> begin
  where
    begin = do
      tokens <- P.many1 parsePositionedToken
      let continue = (tokens ++) <$> begin

      implicitJoin <- (> 0) <$> getOpenBraces
      if implicitJoin then do
        whitespace
        parseComment <|> (P.optional backslash *> eol)
        P.skipMany ignorable
        whitespace
        continue
      else do
        whitespace
        explicitJoin <- P.optionMaybe (backslash *> eol)
        case explicitJoin of
          Nothing   -> ((\t -> tokens ++ [t]) <$> position NewLine) <* P.optional eol
          otherwise -> whitespace *> continue

(I was largely inspired by PureScript's lexer, though I probably ended up with something a lot worse.)

share|improve this question

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.