I made some/many experiments, and this seems to be the main results.
To better understand how batch works, and why sometimes escaping works and other times it seems to fail.
I work this out by many experiments, and I build tests so I can identify the order of the discrete phases.
There exists multiple areas to examine.
I got the
- BatchLineParser - The parser inside of batch files, for lines or blocks
- CmdLineParser - Like the BatchLineParser, but directly at the command prompt, works different
- LabelParser - How call/goto and labels work
- CommandBlockCaching - How parenthesis and caching works
- Tokenizer - How single tokens(groups of characters) build and in which phases
The BatchLineParser:
A line of code in a batch file has multiple phases (on the command line the expansion is different!).
The process starts with phase 1
Phase/order
1) Phase(Percent):
- A double
%%
is replaced by a single %
- Expansion of argument variables (
%1
, %2
, etc.)
- Expansion of
%var%
, if var does not exists replace it with nothing
- For a complete explanation read this from dbenham Same thread: percent expansion
1.5) Remove all <CR>
(CarriageReturn 0x0d) from the line
2) Phase(Special chars, "
<LF>
^
&
|
<
>
(
)
: Look at each character
- If it is a quote (
"
) toggle the quote flag, if the quote flag is active, the following special characters are no longer special: ^
&
|
<
>
(
)
.
- If it is a caret (
^
) the next character has no special meaning, the caret itself is removed, if the caret is the last character of the line, the next line is appended, the first charater of the next line is always handled as escaped charater.
<LF>
stops the parsing immediatly, but not with a caret in front
- If it is one of the special characters
&
|
<
>
split the line at this point, in case of the pipe (|
) both parts gets a phase restart (a bit more complex ...) For more info on how pipes are parsed and processed, look at this question and answers: Why does delayed expansion fail when inside a piped block of code?
- In this phase the primary token list is build, token delimiters are
<space>
<tab>
,
;
=
and <0xFF>
(also known as non-breaking space)
- Process parenthesis (provides for compound statements across multiple lines):
- If the parser is not looking for a command token, then
(
is not special.
- If the parser is looking for a command token and finds
(
, then start a new compound statement and increment the parenthesis counter
- If the parenthesis counter is > 0 then
)
terminates the compound statement and decrements the parenthesis counter.
- If the line end is reached and the parenthesis counter is > 0 then the next line will be appended to the compound statement (starts again with phase 1)
- If the parenthesis counter is = 0, and the parser is looking for a commmand, then
)
and all remaining characters on line are ignored
- In this phase REM, IF and FOR are detected, for the special handling of them.
- If the first token is "
rem
", only two tokens are processed, important for the multiline caret
3) Phase(echo): If "echo is on" print the result of phase 1 and 2
- For-loop-blocks are echoed multiple times, first time in the context of the for-loop, with unexpanded for-loop-vars
- For each iteration, the block is echoed with expanded for-loop-vars
---- These two phases are not really follows directly, but it makes no difference
4) Phase(for-loop-vars expansion): Expansion of %%a
and so on
5) Phase(Exclamation mark): Only if delayed expansion is on, look at each character
- If it is a caret (
^
) the next character has no special meaning, the caret itself is removed
- If it is an exclamation mark, search for the next exclamation mark (carets are not observed anymore), expand to the content of the variable
- Consecutive opening
!
are collapsed into a single !
- Any remaining
!
that cannot be paired is removed
- If no exclamation mark is found in this phase, the result is discarded, the result of phase 4 is used instead (important for the carets)
- Important: At this phase quotes and other specical characters are ignored
- Expanding vars at this stage is "safe", because special characters are not detected anymore (even
<CR>
or <LF>
)
6) Phase(call/caret doubling): Only if the cmd token is CALL
- If the first token is "
call
", start with phase 1 again, but stops after phase 2, delayed expansion are not processed a second time here
- Remove the first
CALL
, so multiple CALL's can be stacked
- Double all carets (the normal carets seems to be stay unchanged, because in phase 2 they are reduced to one, but in quotes they are effectivly doubled)
7) Phase(Execute): The command is executed
- Different tokens are used here, depends on the internal command executed
- In case of a
set "name=content"
, the complete content of the first equal sign to the last quote of the line is used as content-token, if there is no quote after the equal sign, the rest of the line is used.
CmdLineParser:
Works like the BatchLine-Parser, but:
- Goto/call a label isn't allowed
Phase1(Percent):
- %var% will be replaced by the content of var, if the var isn't defined, the expresssion will be unchanged
- No special handling of %%, the second percent could be the beginning of a var, set var=content, %%var%% expands to %Content%
Phase5(exclamation mark): only if "DelayedExpansion" is enabled
- !var! will be replaced by the content of var, if the var isn't defined, the expresssion will be unchanged
for-loop-command-block
e.g. for /F "usebackq" %%a IN (
command block) DO echo %%a
The command block will be parsed two times, at first the BatchLineParser(the loop is inside a batch) or the CmdLineParser(loop on the cmd-line) is active, at the second run always the CmdLineParser is active.
In the second run, DelayedExpansion is active only if it is enabled with the registry key
The second run is like calling the line with cmd /c
Setting of variables are therefore not persistent.
Hope it helps
Jan Erik
find
? – Josh Lee Nov 4 '10 at 7:51