Understanding parser combinators

  • Published on
    18-Mar-2018

  • View
    1.138

  • Download
    0

DESCRIPTION

Understanding Parser Combinators @ScottWlaschin fsharpforfunandprofit.com/parser let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit" let point = pchar '.'…

Transcript

Understanding Parser Combinators @ScottWlaschin fsharpforfunandprofit.com/parser let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit" let point = pchar '.' let e = pchar 'e' pchar 'E' let optPlusMinus = opt (pchar '-' pchar '+') let nonZeroInt = digitOneNine .>>. manyChars digit |>> fun (first,rest) -> string first + rest let intPart = zero nonZeroInt let fractionPart = point >>. manyChars1 digit let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit Typical code using parser combinators let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit" let point = pchar '.' let e = pchar 'e' pchar 'E' let optPlusMinus = opt (pchar '-' pchar '+') let nonZeroInt = digitOneNine .>>. manyChars digit |>> fun (first,rest) -> string first + rest let intPart = zero nonZeroInt let fractionPart = point >>. manyChars1 digit let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit Overview 1. What is a parser combinator library? 2. The foundation: a simple parser 3. Three basic parser combinators 4. Building combinators from other combinators 5. Improving the error messages 6. Building a JSON parser Part 1 What is a parser combinator library? Something to match Parser Create step in parsing recipe Creating a parsing recipe A “Parser-making" function This is a recipe to make something, not the thing itself Parser Combining parsing recipes A recipe to make a more complicated thing Parser Parser combined with A "combinator" Parser Run Running a parsing recipe input Success or Failure Why parser combinators? • Written in your favorite programming language • No preprocessing needed – Lexing, parsing, AST transform all in one. – REPL-friendly • Easy to create little DSLs – Google "fogcreek fparsec" • Fun way of understanding functional composition Part 2: A simple parser Version 1 – parse the character 'A' input pcharA remaining input true/false Version 1 – parse the character 'A' input pcharA remaining input true/false let pcharA input = if String.IsNullOrEmpty(input) then (false,"") else if input.[0] = 'A' then let remaining = input.[1..] (true,remaining) else (false,input) Version 2 – parse any character matched char input pchar remaining input charToMatch failure message let pchar (charToMatch,input) = if String.IsNullOrEmpty(input) then "No more input" else let first = input.[0] if first = charToMatch then let remaining = input.[1..] (charToMatch,remaining) else sprintf "Expecting '%c'. Got '%c'" charToMatch first Fix – create a choice type to capture either case Success: matched char input pchar Success: remaining input charToMatch Failure: message type Result Fix – create a choice type to capture either case Success: matched char input pchar Success: remaining input charToMatch Failure: message type Result Fix – create a choice type to capture either case Success: matched char input pchar Success: remaining input charToMatch Failure: message type Result let pchar (charToMatch,input) = if String.IsNullOrEmpty(input) then Failure "No more input" else let first = input.[0] if first = charToMatch then let remaining = input.[1..] Success (charToMatch,remaining) else let msg = sprintf "Expecting '%c'. Got '%c'" charToMatch first Failure msg Version 3 – returning a function Success: matched char input pchar Success: remaining input charToMatch Failure: message Version 3 – returning a function Success: matched char input pchar Success: remaining input charToMatch Failure: message Version 3 – returning a function input pchar charToMatch Version 3 – returning a function charToMatch pchar Version 3 – returning a function charToMatch pchar Version 4 – wrapping the function in a type charToMatch pchar Parser Version 4 – wrapping the function in a type charToMatch pchar Parser type Parser) A function that takes a string and returns a Result Version 4 – wrapping the function in a type charToMatch pchar Parser type Parser) Wrapper Creating parsing recipes charToMatch input Parser A parsing recipe for a char Parser Run Running a parsing recipe input Success, or Failure Running a parsing recipe input Parser Parser Run input Success, or Failure let run parser input = // unwrap parser to get inner function let (Parser innerFn) = parser // call inner function with input innerFn input Enough talk, show me some code Part 3: Three basic combinators What is a combinator? • A “combinator” library is a library designed around combining things to get more complex values of the same type. • integer + integer = integer • list @ list = list // @ is list concat • Parser ?? Parser = Parser Basic parser combinators • Parser andThen Parser => Parser • Parser orElse Parser => Parser • Parser map (transformer) => Parser AndThen parser combinator • Run the first parser. – If there is a failure, return. • Otherwise, run the second parser with the remaining input. – If there is a failure, return. • If both parsers succeed, return a pair (tuple) that contains both parsed values. let andThen parser1 parser2 = let innerFn input = // run parser1 with the input let result1 = run parser1 input // test the 1st parse result for Failure/Success match result1 with | Failure err -> Failure err // return error from parser1 | Success (value1,remaining1) -> // run parser2 with the remaining input (continued on next slide..) let andThen parser1 parser2 = [...snip...] let result2 = run parser2 remaining1 // test the 2nd parse result for Failure/Success match result2 with | Failure err -> Failure err // return error from parser2 | Success (value2,remaining2) -> let combinedValue = (value1,value2) Success (combinedValue,remaining2) // return the inner function Parser innerFn OrElse parser combinator • Run the first parser. • On success, return the parsed value, along with the remaining input. • Otherwise, on failure, run the second parser with the original input... • ...and in this case, return the result (success or failure) from the second parser. let orElse parser1 parser2 = let innerFn input = // run parser1 with the input let result1 = run parser1 input // test the result for Failure/Success match result1 with | Success result -> // if success, return the original result result1 | Failure err -> // if failed, run parser2 with the input (continued on next slide..) let orElse parser1 parser2 = [...snip...] | Failure err -> // if failed, run parser2 with the input let result2 = run parser2 input // return parser2's result result2 // return the inner function Parser innerFn Map parser combinator • Run the parser. • On success, transform the parsed value using the provided function. • Otherwise, return the failure let mapP f parser = let innerFn input = // run parser with the input let result = run parser input // test the result for Failure/Success match result with | Success (value,remaining) -> // if success, return the value transformed by f let newValue = f value Success (newValue, remaining) (continued on next slide..) let mapP f parser = [...snip...] | Failure err -> // if failed, return the error Failure err // return the inner function Parser innerFn Parser combinator operators pcharA .>>. pcharB // 'A' andThen 'B' pcharA pcharB // 'A' orElse 'B' pcharA |>> (...) // map ch to something Demo Part 4: Building complex combinators from these basic ones [ 1; 2; 3] |> List.reduce (+) // 1 + 2 + 3 [ pcharA; pcharB; pcharC] |> List.reduce ( .>>. ) // pcharA .>>. pcharB .>>. pcharC [ pcharA; pcharB; pcharC] |> List.reduce ( ) // pcharA pcharB pcharC Using reduce to combine parsers let choice listOfParsers = listOfParsers |> List.reduce ( ) let anyOf listOfChars = listOfChars |> List.map pchar // convert char into Parser |> choice // combine them all let parseLowercase = anyOf ['a'..'z'] let parseDigit = anyOf ['0'..'9'] Using reduce to combine parsers /// Convert a list of parsers into a Parser of list let sequence listOfParsers = let concatResults p1 p2 = // helper p1 .>>. p2 |>> (fun (list1,list2) -> list1 @ list2) listOfParsers // map each parser result to a list |> Seq.map (fun parser -> parser |>> List.singleton) // reduce by concatting the results of AndThen |> Seq.reduce concatResults Using reduce to combine parsers /// match a specific string let pstring str = str // map each char to a pchar |> Seq.map pchar // convert to Parser |> sequence // convert Parser to Parser |>> List.toArray // convert Parser to Parser |>> String Using reduce to combine parsers Demo Yet more combinators “More than one” combinators let many p = ... // zero or more let many1 p = ... // one or more let opt p = ... // zero or one // example let whitespaceChar = anyOf [' '; '\t'; '\n'] let whitespace = many1 whitespaceChar “Throwing away” combinators p1 .>> p2 // throw away right side p1 >>. p2 // throw away left side // keep only the inside value let between p1 p2 p3 = p1 >>. p2 .>> p3 // example let pdoublequote = pchar '"' let quotedInt = between pdoublequote pint pdoublequote “Separator” combinators let sepBy1 p sep = ... /// one or more p separated by sep let sepBy p sep = ... /// zero or more p separated by sep // example let comma = pchar ',' let digit = anyOf ['0'..'9'] let oneOrMoreDigitList = sepBy1 digit comma Demo Part 5: Improving the error messages input Parser Named parsers Name: “Digit” Parsing Function: Named parsers let ( ) = setLabel // infix version run parseDigit "ABC" // without the label // Error parsing "9" : Unexpected 'A' let parseDigit_WithLabel = anyOf ['0'..'9'] "digit" run parseDigit_WithLabel "ABC" // with the label // Error parsing "digit" : Unexpected 'A' input Parser Extra input context Input: * Stream of characters * Line, Column Extra input context run pint "-Z123" // Line:0 Col:1 Error parsing integer // -Z123 // ^Unexpected 'Z' run pfloat "-123Z45" // Line:0 Col:4 Error parsing float // -123Z45 // ^Unexpected 'Z' Part 6: Building a JSON Parser // A type that represents the previous diagram type JValue = | JString of string | JNumber of float | JObject of Map | JArray of JValue list | JBool of bool | JNull Parsing JSON Null // new helper operator. let (>>%) p x = p |>> (fun _ -> x) // runs parser p, but ignores the result // Parse a "null" let jNull = pstring "null" >>% JNull // map to JNull "null" // give it a label Parsing JSON Bool // Parse a boolean let jBool = let jtrue = pstring "true" >>% JBool true // map to JBool let jfalse = pstring "false" >>% JBool false // map to JBool // choose between true and false jtrue jfalse "bool" // give it a label Parsing a JSON String Call this "unescaped char" /// Parse an unescaped char let jUnescapedChar = let label = "char" satisfy (fun ch -> (ch '\\') && (ch '\"') ) label Call this "escaped char" let jEscapedChar = [ // each item is (stringToMatch, resultChar) ("\\\"",'\"') // quote ("\\\\",'\\') // reverse solidus ("\\/",'/') // solidus ("\\b",'\b') // backspace ("\\f",'\f') // formfeed ("\\n",'\n') // newline ("\\r",'\r') // cr ("\\t",'\t') // tab ] // convert each pair into a parser |> List.map (fun (toMatch,result) -> pstring toMatch >>% result) // and combine them into one |> choice "escaped char" // set label Call this "unicode char" "unescaped char" or "escaped char" or "unicode char" let quotedString = let quote = pchar '\"' "quote" let jchar = jUnescapedChar jEscapedChar jUnicodeChar // set up the main parser quote >>. manyChars jchar .>> quote let jString = // wrap the string in a JString quotedString |>> JString // convert to JString "quoted string" // add label Parsing a JSON Number "int part" "sign part" let optSign = opt (pchar '-') let zero = pstring "0" let digitOneNine = satisfy (fun ch -> Char.IsDigit ch && ch '0') "1-9" let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit" let nonZeroInt = digitOneNine .>>. manyChars digit |>> fun (first,rest) -> string first + rest // set up the integer part let intPart = zero nonZeroInt "fraction part" // set up the fraction part let point = pchar '.' let fractionPart = point >>. manyChars1 digit "exponent part" // set up the exponent part let e = pchar 'e' pchar 'E' let optPlusMinus = opt (pchar '-' pchar '+') let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit "exponent part" "int part" "fraction part" "sign part" // set up the main JNumber parser optSign .>>. intPart .>>. opt fractionPart .>>. opt exponentPart |>> convertToJNumber // not shown "number" // add label Parsing JSON Arrays and Objects Completing the JSON Parser // the final parser combines the others together let jValue = choice [ jNull jBool jNumber jString jArray jObject ] Demo: the JSON parser in action Summary • Treating a function like an object – Returning a function from a function – Wrapping a function in a type • Working with a "recipe" (aka "effect") – Combining recipes before running them. • The power of combinators – A few basic combinators: "andThen", "orElse", etc. – Complex parsers are built from smaller components. • Combinator libraries are small but powerful – Less than 500 lines for combinator library – Less than 300 lines for JSON parser itself Want more? • For a production-ready library for F#, search for "fparsec" • There are similar libraries for other languages http://www.quanttec.com/fparsec/ Thanks! @ScottWlaschin fsharpforfunandprofit.com/parser Contact me Slides and video here Let us know if you need help with F# http://fsharpforfunandprofit.com/parser/ http://fsharpforfunandprofit.com/parser/

Recommended

View more >