php - Regex - get elements to render if statement -
i'm designing script , trying if construct without eval in php.
still incomplete blasting through, it's templating engine, "if" part of engine. no assignment operators allowed, need test values without allowing php code injections, precisely not using eval it'll need individual operations between variables preventing injection attacks.
regex must capture
[if:(a+b-c/d*e)|(x-y)&!(z%3=0)] output [elseif:('b'+'atman'='batman')] output2 [elseif:('b'+'atman'='batman')] output3 [elseif:('b'+'atman'='batman')] output4 [else] output5 [endif] [if:(a+b-c/d*e)|(x-y)&!(z%3=0)] output6 [else] output7 [endif]
the following works if, elseif, else , endif blocks along condition statements:
$regex = '^\h*\[if:(.*)\]\r(?<if>(?:(?!\[elseif)[\s\s])+)\r^\h*\[elseif:(.*)\]\r(?<elseif>(?:(?!\[else)[\s\s])+)\r^\h*\[else.*\]\r(?<else>(?:(?!\[endif)[\s\s])+)\r^\[endif\]~xm';
please in having optional elseif , else.
then condition statement, can operations with:
$regex = '~([^\^=<>+\-%/!&|()*]+)([\^+\-%/!|&*])([^\^=<>+\-%/!&|()*]*)~';
however, it'll pair them, missing each 3rd operator...
thanks help.
(edit added simple if/elseif body parsing regex @ bottom)
using pcre, think regex recursion should handle nested
if/elseif/else/endif
constructs.
in it's current form loose parse in doesn't define
form of [if/elseif: body ]
.
instance, [if:
beginning delimiter construct , ]
end? , should error occur, etc.. done way if needing strict parse.
right using [if: body ]
beginning delimiter
, [endif]
end delimiter find nesting constructs.
also, loosely defines body
[^\]]*
which, in serious parsing
situation, have fleshed out account quotes , stuff.
said, breaking apart doable, more
involved. i've done on language level, , it's not trivial.
there host language usage pseudocode sample on bottom.
language recursion demonstrates how extract nested content
correctly.
the regex matches current outter shell of core. core
inner nested content.
each call parsecore() initiated inside parsecore() itself
(except initial call main().
since scoping seems unspecified, i've made assumptions can seen
littered in comments.
there placeholder if/elseif
body captured that
can parsed (operations)
portion part 2
of exercise haven't gotten around doing yet.
note - try this, don't have time today.
let me know if have questions..
(?s)(?:(?<content>(?&_content))|\[elseif:(?<elseif_body>(?&_ifbody)?)\]|(?<else>(?&_else))|(?<begin>\[if:(?<if_body>(?&_ifbody)?)\])(?<core>(?&_core)|)(?<end>\[endif\])|(?<error>(?&_keyword)))(?(define)(?<_ifbody>(?>[^\]])+)(?<_core>(?>(?<_content>(?>(?!(?&_keyword)).)+)|(?(<_else>)(?!))(?<_else>(?>\[else\]))|(?(<_else>)(?!))(?>\[elseif:(?&_ifbody)?\])|(?>\[if:(?&_ifbody)?\])(?:(?=.)(?&_core)|)\[endif\])+)(?<_keyword>(?>\[(?:(?:if|elseif):(?&_ifbody)?|endif|else)\])))
(?s) # dot-all modifier # ===================== # outter scope # --------------- (?: (?<content> # (1), non-keyword content (?&_content) ) | # or, # -------------- \[ elseif: # else if (?<elseif_body> # (2), else if body (?&_ifbody)? ) \] | # or # -------------- (?<else> # (3), else (?&_else) ) | # or # -------------- (?<begin> # (4), if \[ if: (?<if_body> # (5), if body (?&_ifbody)? ) \] ) (?<core> # (6), core (?&_core) | ) (?<end> # (7) \[ endif \] # end if ) | # or # -------------- (?<error> # (8), unbalanced if, elseif, else, or end (?&_keyword) ) ) # ===================== # subroutines # --------------- (?(define) # __ if body ---------------------- (?<_ifbody> # (9) (?> [^\]] )+ ) # __ core ------------------------- (?<_core> # (10) (?> # # __ content ( non-keywords ) (?<_content> # (11) (?> (?! (?&_keyword) ) . )+ ) | # # __ else # guard: 1 'else' # allowed in core !! (?(<_else>) (?!) ) (?<_else> # (12) (?> \[ else \] ) ) | # # __ elseif # guard: not else before elseif # allowed in core !! (?(<_else>) (?!) ) (?> \[ elseif: (?&_ifbody)? \] ) | # # if (block start) (?> \[ if: (?&_ifbody)? \] ) # recurse core (?: (?= . ) (?&_core) | ) # end if (block end) \[ endif \] )+ ) # __ keyword ---------------------- (?<_keyword> # (13) (?> \[ (?: (?: if | elseif ) : (?&_ifbody)? | endif | else ) \] ) ) )
host language pseudo-code
bool bstoponerror = false; regex rxcore("....."); // above regex .. bool parsecore( string score, int nlevel ) { // locals bool bfounderror = false; bool bbeforeelse = true; match _matcher; while ( search ( core, rxcore, _matcher ) ) { // content if ( _matcher["content"].matched == true ) // print non-keyword content print ( _matcher["content"].str() ); // or, analyze content. // if 'content' has error's , wish return. // if ( bstoponerror ) // bfounderror = true; else // elseif if ( _matcher["elseif_body"].matched == true ) { // check if not in recursion if ( nlevel <= 0 ) { // report error, 'elseif' outside 'if/endif' block // ( note - occur when nlevel == 0 ) print ("\n>> error, 'elseif' not in block, body = " + _matcher["elseif_body"].str() + "\n"; // if 'else' error stop process. if ( bstoponerror == true ) bfounderror = true; } else { // here, inside core recursion. // means have not hit 'else' yet // because elseif's precede it. // print 'elseif'. print ( "elseif: " ); // tbd - body regex below // analyze 'elseif' body. // it's body parsed. // use body parsing (operations) regex on it. string selifbody = _matcher["elseif_body"].str() ); // if 'elseif' body error stop process. if ( bstoponerror == true ) bfounderror = true; } } // else if ( _matcher["else"].matched == true ) { // check if not in recursion if ( nlevel <= 0 ) { // report error, 'else' outside 'if/endif' block // ( note - occur when nlevel == 0 ) print ("\n>> error, 'else' not in block\n"; // if 'else' error stop process. if ( bstoponerror == true ) bfounderror = true; } else { // here, inside core recursion. // means there can 1 'else' within // relative scope of single core. // print 'else'. print ( _matcher["else"].str() ); // set state of 'else'. bbeforeelse == false; } } else // error ( occur when nlevel == 0 ) if ( _matcher["error"].matched == true ) { // report error print ("\n>> error, unbalanced " + _matcher["error"].str() + "\n"; // // if unbalanced 'if/endif' error stop process. if ( bstoponerror == true ) bfounderror = true; } else // if/endif block if ( _matcher["begin"].matched == true ) { // print 'if' print ( "if:" ); // analyze 'if body' error , wish return. // tbd - body regex below. // analyze 'if' body. // it's body parsed. // use body parsing (operations) regex on it. string sifbody = _matcher["if_body"].str() ); // if 'if' body error stop process. if ( bstoponerror == true ) bfounderror = true; else { ////////////////////////////// // recurse new 'core' bool bresult = parsecore( _matcher["core"].str(), nlevel+1 ); ////////////////////////////// // check recursion result. see if should unwind. if ( bresult == false && bstoponerror == true ) bfounderror = true; else // print 'end' print ( "endif" ); } } else { // reserved placeholder, won't here @ time. } // error-return check if ( bfounderror == true && bstoponerror == true ) return false; } // finished core!! return true. return true; } /////////////////////////////// // main string strinitial = "..."; bool bresult = parsecore( strinitial, 0 ); if ( bresult == false ) print ( "parse terminated abnormally, check messages!\n" );
output sample of outter core matches
note there many more matches when inner core's matched.
** grp 0 - ( pos 0 , len 211 ) [if:(a+b-c/d*e)|(x-y)&!(z%3=0)] output [elseif:('b'+'atman'='batman')] output2 [elseif:('b'+'atman'='batman')] output3 [elseif:('b'+'atman'='batman')] output4 [else] output5 [endif] ** grp 1 [content] - null ** grp 2 [elseif_body] - null ** grp 3 [else] - null ** grp 4 [begin] - ( pos 0 , len 31 ) [if:(a+b-c/d*e)|(x-y)&!(z%3=0)] ** grp 5 [if_body] - ( pos 4 , len 26 ) (a+b-c/d*e)|(x-y)&!(z%3=0) ** grp 6 [core] - ( pos 31 , len 173 ) output [elseif:('b'+'atman'='batman')] output2 [elseif:('b'+'atman'='batman')] output3 [elseif:('b'+'atman'='batman')] output4 [else] output5 ** grp 7 [end] - ( pos 204 , len 7 ) [endif] ** grp 8 [error] - null ** grp 9 [_ifbody] - null ** grp 10 [_core] - null ** grp 11 [_content] - null ** grp 12 [_else] - null ** grp 13 [_keyword] - null ----------------------------- ** grp 0 - ( pos 211 , len 4 ) ** grp 1 [content] - ( pos 211 , len 4 ) ** grp 2 [elseif_body] - null ** grp 3 [else] - null ** grp 4 [begin] - null ** grp 5 [if_body] - null ** grp 6 [core] - null ** grp 7 [end] - null ** grp 8 [error] - null ** grp 9 [_ifbody] - null ** grp 10 [_core] - null ** grp 11 [_content] - null ** grp 12 [_else] - null ** grp 13 [_keyword] - null ----------------------------- ** grp 0 - ( pos 215 , len 74 ) [if:(a+b-c/d*e)|(x-y)&!(z%3=0)] output6 [else] output7 [endif] ** grp 1 [content] - null ** grp 2 [elseif_body] - null ** grp 3 [else] - null ** grp 4 [begin] - ( pos 215 , len 31 ) [if:(a+b-c/d*e)|(x-y)&!(z%3=0)] ** grp 5 [if_body] - ( pos 219 , len 26 ) (a+b-c/d*e)|(x-y)&!(z%3=0) ** grp 6 [core] - ( pos 246 , len 36 ) output6 [else] output7 ** grp 7 [end] - ( pos 282 , len 7 ) [endif] ** grp 8 [error] - null ** grp 9 [_ifbody] - null ** grp 10 [_core] - null ** grp 11 [_content] - null ** grp 12 [_else] - null ** grp 13 [_keyword] - null
this if/elseif body regex
raw
(?|((?:\s*[^\^=<>+\-%/!&|()\[\]*\s]\s*)+)([\^+\-%/*=]+)(?=\s*[^\^=<>+\-%/!&|()\[\]*\s])|\g(?!^)(?<=[\^+\-%/*=])((?:\s*[^\^=<>+\-%/!&|()\[\]*\s]\s*)+)())
stringed
'~(?|((?:\s*[^\^=<>+\-%/!&|()\[\]*\s]\s*)+)([\^+\-%/*=]+)(?=\s*[^\^=<>+\-%/!&|()\[\]*\s])|\g(?!^)(?<=[\^+\-%/*=])((?:\s*[^\^=<>+\-%/!&|()\[\]*\s]\s*)+)())~'
expanded
(?| # branch reset ( # (1 start), operand (?: \s* [^\^=<>+\-%/!&|()\[\]*\s] \s* )+ ) # (1 end) ( [\^+\-%/*=]+ ) # (2), forward operator (?= \s* [^\^=<>+\-%/!&|()\[\]*\s] ) | \g (?! ^ ) (?<= [\^+\-%/*=] ) ( # (1 start), last operand (?: \s* [^\^=<>+\-%/!&|()\[\]*\s] \s* )+ ) # (1 end) ( ) # (2), last-empty forward operator )
here how operates:
assumes simple constructs.
parse math operand/operator stuff.
won't parse enclosing parenthesis blocks, nor logic or math
operators in between.
if needed, parse parenthesis blocks ahead of time, i.e. \( [^)* \)
or
similar. or split on logic operators |
.
the body regex uses branch reset operand/operator sequence.
matches 2 things.
group 1 contains operand, group 2 operator.
if group 2 empty, group 1 last operand in sequence.
valid operators ^ + - % / * =
.
equals =
included because separates cluster of operations
, can noted separation.
the conclusion body regex is simple and
suited simple usage. more of complexity involved
won't way go.
input/output sample 1:
(a+b-c/d*e) ** grp 1 - ( pos 1 , len 1 ) ** grp 2 - ( pos 2 , len 1 ) + ------------ ** grp 1 - ( pos 3 , len 1 ) b ** grp 2 - ( pos 4 , len 1 ) - ------------ ** grp 1 - ( pos 5 , len 1 ) c ** grp 2 - ( pos 6 , len 1 ) / ------------ ** grp 1 - ( pos 7 , len 1 ) d ** grp 2 - ( pos 8 , len 1 ) * ------------ ** grp 1 - ( pos 9 , len 1 ) e ** grp 2 - ( pos 10 , len 0 ) empty
input/output sample 2:
('b'+'atman'='batman') ** grp 1 - ( pos 1 , len 3 ) 'b' ** grp 2 - ( pos 4 , len 1 ) + ------------ ** grp 1 - ( pos 5 , len 7 ) 'atman' ** grp 2 - ( pos 12 , len 1 ) = ------------ ** grp 1 - ( pos 13 , len 8 ) 'batman' ** grp 2 - ( pos 21 , len 0 ) empty
Comments
Post a Comment