Take the 2-minute tour ×

Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Python tokenizer in Haskell + Parsec

up vote 2 down vote favorite

I wrote a tokenizer/lexer (difference?) for Python in Haskell: this is my code on GitHub.

I already tested it out with some of CPython's standard library scripts, and it doesn't fail. However, it probably outputs non-conforming output. But I don't care about this right now.

What I'm looking for is a general (functional) style review. This is my first time using Parsec and Haskell for something relatively big, so I expect my functional style to be suboptimal.

Thanks.

Significant code snippets:

parseTokens :: LexemeParser [PositionedToken]
parseTokens = do
  P.skipMany ignorable
  tokens <- concat <$> P.many parseLogicalLine
  P.eof
  endIndents <- length . filter (>0) <$> getIndentLevels
  dedents <- mapM position $ replicate endIndents Dedent
  endMarker <- position EndMarker
  return (tokens ++ dedents ++ [endMarker])


parseIndentation :: LexemeParser [PositionedToken]
parseIndentation = mapM position =<< go
  where
    go = do
      curr <- length <$> P.many (P.char ' ')
      prev <- getIndentLevel
      case curr `compare` prev of
        EQ -> return []
        GT -> return [Indent] <* pushIndentLevel curr
        LT -> do
          levels <- getIndentLevels
          when (not $ elem curr levels) (fail $ "indentation must be: " ++ show levels)
          let (pop, levels') = span (> curr) levels
          putIndentLevels levels'
          return $ replicate (length pop) Dedent


parseLogicalLine :: LexemeParser [PositionedToken]
parseLogicalLine = do
  P.skipMany ignorable
  indentation <- parseIndentation
  (indentation ++) <$> begin
  where
    begin = do
      tokens <- P.many1 parsePositionedToken
      let continue = (tokens ++) <$> begin

      implicitJoin <- (> 0) <$> getOpenBraces
      if implicitJoin then do
        whitespace
        parseComment <|> (P.optional backslash *> eol)
        P.skipMany ignorable
        whitespace
        continue
      else do
        whitespace
        explicitJoin <- P.optionMaybe (backslash *> eol)
        case explicitJoin of
          Nothing   -> ((\t -> tokens ++ [t]) <$> position NewLine) <* P.optional eol
          otherwise -> whitespace *> continue

(I was largely inspired by PureScript's lexer, though I probably ended up with something a lot worse.)

edited Aug 22 at 23:34

asked Aug 22 at 23:18

Gabriel Garcia
1112

add a comment |

Your Answer

Sign up or log in

Post as a guest

Name

Post as a guest

Name

discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged haskell functional-programming parsec or ask your own question.

question feed

asked	14 days ago
viewed	31 times

current community

your communities

more stack exchange communities

Python tokenizer in Haskell + Parsec

Your Answer

Browse other questions tagged haskell functional-programming parsec or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Python tokenizer in Haskell + Parsec

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged haskell functional-programming parsec or ask your own question.

Related

Hot Network Questions