parsing - Pattern Matching in dypgen -
i want handle ambiguities in dypgen. found in manual, want know, how can use that. in manual point 5.2 "pattern matching on symbols" there example:
expr: | expr op<"+"> expr { $1 + $2 } | expr op<"*"> expr { $1 * $2 }
op matched "+" or "*", understand. find there:
the patterns can caml patterns (but without keyword when). instance possible:
expr: expr<(function([arg1;arg2],f_body)) f> expr { action }
so tried put there other expressions, dont understand, happens. if put in there printf
outputs value of matched string. if put in there (fun x -> printf x)
, seems me same printf
, dypgen complains syntax error , points end of expression. if put printf.printf
in there, complains syntax error: operator expected
. , if put there (fun x -> printf.printf x)
says: lexing failed message: lexing: empty token
these different error-messages mean?
in end in hashtable, if value in there, don't know, if possible way. or isn't possible?
edit: minimal example derived forest-example dypgen-demos.
the grammarfile forest_parser.dyp contains:
{ open parse_tree let dyp_merge = dyp.keep_all } %start main %layout [' ' '\t'] %% main : np "." "\n" { $1 } np: | sg {noun($1)} | pl {noun($1)} sg: word <word("sheep"|"fish")> {sg($1)} sg: word <word("cat"|"dog")> {sg($1)} pl: word <word("sheep"|"fish")> {pl($1)} pl: word <word("cats"|"dogs")> {pl($1)} /* or try: sg: word <printf> {sg($1)} pl: word <printf> {pl($1)} */ word: | (['a'-'z' 'a'-'z']+) {word($1)}
the forest.ml has following print_forest-function now:
let print_forest forest = let rec aux1 t = match t | word x -> print_string x | noun (x) -> ( print_string "n ["; aux1 x; print_string " ]") | sg (x) -> ( print_string "sg ["; aux1 x; print_string " ]") | pl (x) -> ( print_string "pl ["; aux1 x; print_string " ]") in let aux2 t = aux1 t; print_newline () in list.iter aux2 forest; print_newline ()
and parser_tree.mli contains:
type tree = | word of string | noun of tree | sg of tree | pl of tree
and can determine, numeri fish, sheep, cat(s) etc. are.
sheep or fish can singular , plural. cats , dogs cannot. fish. n [sg [fish ] ] n [pl [fish ] ]
i know nothing dypgen tried figure out.
let's see found out.
in parser.dyp file can define lexer , parser or can use external lexer. here's did :
my ast looks :
parse_prog.mli
type f = | print of string | function of string list * string * string type program = f list
prog_parser.dyp
{ open parse_prog (* let dyp_merge = dyp.keep_all *) let string_buf = buffer.create 10 } %start main %relation pf<pr %lexer let newline = '\n' let space = [' ' '\t' '\r'] let uident = ['a'-'z']['a'-'z' 'a'-'z' '0'-'9' '_']* let lident = ['a'-'z']['a'-'z' 'a'-'z' '0'-'9' '_']* rule string = parse | '"' { () } | _ { buffer.add_string string_buf (dyp.lexeme lexbuf); string lexbuf } main lexer = newline | space + -> { () } "fun" -> anonymfunction { () } lident -> function { dyp.lexeme lexbuf } uident -> module { dyp.lexeme lexbuf } '"' -> string { buffer.clear string_buf; string lexbuf; buffer.contents string_buf } %parser main : function_calls eof { $1 } function_calls: | { [] } | function_call ";" function_calls { $1 :: $3 } function_call: | printf string { print $2 } pr | "(" anonymfunction lident "->" printf lident ")" string { print $6 } pf | nested_modules "." function string { function ($1, $3, $4) } pf | function string { function ([], $1, $2) } pf | "(" anonymfunction lident "->" function lident ")" string { function ([], $5, $8) } pf printf: | function<"printf"> { () } | module<"printf"> "." function<"printf"> { () } nested_modules: | module { [$1] } | module "." nested_modules { $1 :: $3 }
this file important. can see, if have function printf "test"
grammar ambiguous , can reduced either print "test"
or function ([], "printf", "test")
!, realized, can give priorities rules if 1 higher priority 1 chosen first parsing. (try uncomment let dyp_merge = dyp.keep_all
, you'll see possible combinations).
and in main :
main.ml
open parse_prog let print_stlist fmt sl = match sl | [] -> () | _ -> list.iter (format.fprintf fmt "%s.") sl let print_program tl = let aux1 t = match t | function (ml, f, p) -> format.printf "i can't %a%s(\"%s\")@." print_stlist ml f p | print s -> format.printf "you want print : %s@." s in let aux2 t = list.iter (fun (tl, _) -> list.iter aux1 tl; format.eprintf "------------@.") tl in list.iter aux2 tl let input_file = sys.argv.(1) let lexbuf = dyp.from_channel (forest_parser.pp ()) (pervasives.open_in input_file) let result = parser_prog.main lexbuf let () = print_program result
and, example, following file :
test
printf "first print"; printf.printf "nested print"; format.eprintf "nothing possible"; (fun x -> printf x) "anonymous print";
if execute ./myexec test
following prompt
you want print : first print want print : nested print can't format.eprintf("nothing possible") want print : x ------------
so, tl;dr, manual example here show you can play defined tokens (i never defined token print, function) , match on them new rules.
i hope it's clear, learned lot question ;-)
[edit] so, changed parser match wanted watch :
{ open parse_prog (* let dyp_merge = dyp.keep_all *) let string_buf = buffer.create 10 } %start main %relation pf<pp %lexer let newline = '\n' let space = [' ' '\t' '\r'] let uident = ['a'-'z']['a'-'z' 'a'-'z' '0'-'9' '_']* let lident = ['a'-'z']['a'-'z' 'a'-'z' '0'-'9' '_']* rule string = parse | '"' { () } | _ { buffer.add_string string_buf (dyp.lexeme lexbuf); string lexbuf } main lexer = newline | space + -> { () } "fun" -> anonymfunction { () } lident -> function { dyp.lexeme lexbuf } uident -> module { dyp.lexeme lexbuf } '"' -> string { buffer.clear string_buf; string lexbuf; buffer.contents string_buf } %parser main : function_calls eof { $1 } function_calls: | { [] } pf | function_call <function((["printf"] | []), "printf", st)> ";" function_calls { (print st) :: $3 } pp | function_call ";" function_calls { $1 :: $3 } pf function_call: | nested_modules "." function string { function ($1, $3, $4) } | function string { function ([], $1, $2) } | "(" anonymfunction lident "->" function lident ")" string { function ([], $5, $8) } nested_modules: | module { [$1] } | module "." nested_modules { $1 :: $3 }
here, can see, don't handle fact function print when parse when put in functions list. so, match on algebraic type
built parser. hope example ok ;-) (but warned, extremely ambiguous ! :-d)
Comments
Post a Comment