Brady Ouren

Parser Combinatorz part2


In the previous post I showed how to use parsec to parse data in a format like this:

"1%400:3.2 6%some_description|100:1"

Why not regex?

I certainly could’ve used a regex pattern like \d\%(\w*\|)?(\d+):(\d+\.?\d?)

…but, there are some scenarios where this falls apart quite quickly:

  • if we learn about other formats of data that can be included
  • if we have other parsing tasks that need similar matchers?
  • if we need to morph the data in some way before matching
  • if the list of possible separators are very large. (\d\%|\$$|\&|...)

An example to prove I’m not making this up

I had never encountered the acronym FFR until I started working in financial software. It stands for Fixed Format Response, but that’s not really important. The important part is that the FFR we’re dealing with has ~100 different signals which indicate a specific type of data.

So, we’ll create a data type deriving Enum to describe how we expect to split the data up.

data Signal
  = AD02 | AD11 | AH11 | AM01 |
    AO01 | AR01 | AS01 | AT11 | BR01 |
    -- ... more of these removed for reading clarity
    UA11 | UF11 | VH01 | VS01 | WS01 |
    YI01 | ZC01
  deriving (Show, Enum, Ord, Eq, Read)

allSignals :: [String]
allSignals = map show [AD02 ..]

Note: The syntax for allSignals is just enumerating all the constructors. (The space is significant [YourFirstEnum ..])

-- notice we're reusing this from the previous parser
anythingUntil p = manyTill anyToken p

anySignal :: Parser (Signal, String)
anySignal = do
  signal <- signalParser
  content <- anythingUntil (endOfLineOrInput <|> signalLookahead)
  return (toSignal signal, content)

signalLookahead = lookAhead signalParser *> return ()

signalParser :: Parser String
signalParser = choice $ fmap try $ string <$> allSignals

We’re going to use the anySignal parser to pull out many pieces of content from a string, but the interesting part is the signalParser itself. choice and <|> are the same, but we need to choose between all the signals so we pass a list of Parsers. If it helps, it looks a bit like this if you were to expand it:

choice [(try $ string "AD02"), (try $ string "AD11"), ...]

Another thing to note is the signalLookahead. We need to avoid eating up the next signal and just use it to signal the end of input.

Once again, there’s a freeze of the jupyter notebook if you’d like to see it in the full context (here)

Further processing

There are many more things we can do with our data in this format, but the first thing I would do is consume the data into some Map like this:

type SignalMap = Map.Map Signal String

From here we’d want to inspect what each signal has inside of it, so we can take from this Map and further parse the string content.


Thanks a bunch to both of these resources (which are both far better and more comprehensive than this):