Skip to content

Commit f33c979

Browse files
committed
[en] rewrote "Nginx Variables (01)" and also added internal sections to improve readability.
1 parent f7905e1 commit f33c979

File tree

1 file changed

+163
-116
lines changed

1 file changed

+163
-116
lines changed

en/01-NginxVariables01.tut

Lines changed: 163 additions & 116 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,60 @@
11
= Nginx Variables (01) =
22

3-
Nginx's configuration is itself a mini language. Many Nginx configurations
4-
are practically programs.
5-
The language might not be Turing-Complete, as far as I can see, its design
3+
== String Container ==
4+
5+
Nginx's configuration files use a micro programming language. Many real-world
6+
Nginx configuration files are essentially small programs.
7+
This language's design
68
is heavily influenced by
7-
Perl and Bourne Shell. This is a characteristic feature of Nginx, comparing
9+
Perl and Bourne Shell as far as I can see, despite the fact that it might not
10+
be Turing-Complete. This is a distinguishing feature of Nginx, as compared
811
to the other web servers
9-
such as Apache or Lighttpd. Being a language, "Variable" declaration becomes
10-
a common concept (However,
11-
exception does exist in Functional Languages such as Haskell)
12+
like Apache or Lighttpd. Being a programming language, "variables" are
13+
thus a natural part of it (exceptions do exist, of course, as in pure
14+
functional languages like Haskell).
15+
16+
Variables are just containers holding various values in imperative languages
17+
like Perl, Bourne Shell, and C/C++.
18+
And "values" here can be numbers like C<3.14>, strings like
19+
C<hello world>, or even complicated things like references to arrays or
20+
hash tables. For the
21+
Nginx configuration language, however, variables can only hold one single type
22+
of values, that is, strings.
1223

13-
For those who know well imperative languages like Perl, Bourne Shell, C/C++,
14-
variable is nothing but
15-
a container holding various values, and the "value" can be numbers like
16-
C<3.14> or strings like
17-
C<hello world>. Values can be as complicated as references to arrays or
18-
hash tables too. However in the
19-
Nginx configuration, variable contains one and only one type of value:
20-
strings.
24+
== Variable Syntax and Interpolation ==
2125

22-
For example, our F<nginx.conf> has following variable declaration:
26+
Let's say our F<nginx.conf> configuration file has the following configuration
27+
line:
2328

2429
:nginx
2530
set $a "hello world";
2631

27-
We have used built-in L<ngx_rewrite> module's L<ngx_rewrite/set> command
28-
to declare and initialize
29-
the variable C<$a>. Specifically, it is assigned with strings C<hello world>.
30-
Like Perl and PHP, the
31-
Nginx syntax requires prefix C<$> to declare and devalue variables.
32+
where we assign a value to the variable C<$a> via the L<ngx_rewrite/set>
33+
configuration directive coming from the standard L<ngx_rewrite> module. In
34+
particular, we assign the string value C<hello world> to it.
3235

33-
Many C<Java> and C<C#> programmers dislike the ugly C<$> variable prefix,
34-
yet the approach does have
35-
a few advantages, notably, variables can be embedded directly in a string
36-
to construct another string
36+
We can see that the Nginx variable name takes a dollar sign (C<$>) in front of
37+
it. This is required by the language syntax: whenever we want to reference an
38+
Nginx variable in the configuration file, we must add a C<$> prefix. This look
39+
very familiar to those Perl and PHP programmers.
40+
41+
Such variable prefix modifiers may discomfort some C<Java> and C<C#>
42+
programmers, this notation does have an
43+
obvious advantage though, that is, variables can be embedded directly into a
44+
string literal:
3745

3846
:nginx
3947
set $a hello;
4048
set $b "$a, $a";
4149

42-
It is using Nginx variable C<$a>, to construct variable C<$b>. Now C<$a>
43-
is C<hello>, and C<$b> is
44-
C<hello, hello>. The technique is called "variable interpolation" in Perl.
45-
It effectively executes
46-
the string concatenation.
50+
Here we use the value of the existing Nginx variable C<$a> to construct the
51+
value for the variable C<$b>. So after these two directives complete execution,
52+
the value of C<$a> is C<hello>, and C<$b> C<hello, hello>. This technique is
53+
called "variable interpolation" in the Perl world, which makes ad-hoc string
54+
concatenation operators no longer that necessary. Let's use the same term for
55+
the Nginx world from now on.
4756

48-
Let's have a look at another example:
57+
Let's see another complete example:
4958

5059
:nginx
5160
server {
@@ -57,34 +66,55 @@ Let's have a look at another example:
5766
}
5867
}
5968

60-
The example omits the outter C<http> directive and C<events> directive
61-
in F<nginx.conf>. With
62-
the HTTP client utility C<curl>, we can issue a HTTP request to C</test>
63-
from command line and
64-
obtain following result:
69+
This example omits the C<http> directive and C<events> configuration blocks in
70+
the outer-most scope for brevity. To request this C</test> interface via
71+
C<curl>, an HTTP client utility, on the command line, we get
6572

6673
:bash
6774
$ curl 'http://localhost:8080/test'
6875
foo: hello
6976

70-
Here we use 3rd party module L<ngx_echo> and its command L<ngx_echo/echo>
71-
to print the value
72-
of variable C<$foo> as HTTP response.
77+
Here we use the L<ngx_echo/echo> directive of the 3rd party module L<ngx_echo>
78+
to print out the value of the C<$foo> variable as the HTTP response.
7379

74-
We can assert that L<ngx_echo/echo> supports "variable interpolation",
75-
yet we must not take it
76-
for granted, since not all the variable commands supports "variable interpolation"
80+
Apparently the arguments of the L<ngx_echo/echo> directive does support
81+
"variable interpolation", but we
82+
can not take it
83+
for granted for other directives. Because not all the configuration directives
84+
support "variable interpolation"
7785
and it is
78-
in fact up to the module's implementation.
86+
in fact up to the implementation of the directive in that module. Always look
87+
up the documentation to be sure.
88+
89+
=== Escaping "$" ===
90+
91+
We've already learned that the C<$> character is special and it serves as the
92+
variable name prefix, but now consider that we want to output a literal C<$>
93+
character via the L<ngx_echo/echo> directive. The following naive example does
94+
not work at all:
95+
96+
? :nginx
97+
? location /t {
98+
? echo "$";
99+
? }
100+
101+
we will get the following error message while loading this configuration:
79102

80-
Is there any way to escape C<$> so that it is no more than a typical dollar
81-
sign by using
82-
L<ngx_echo/echo> ? The answer is negative (the answer still holds in the
103+
[emerg] invalid variable name in ...
104+
105+
Obviously Nginx is try to parse C<$"> as a variable name. Is there a way to
106+
escape C<$> in the string literal? The answer is "no" (it is still the case in
107+
the
83108
latest Nginx stable
84-
release C<1.0.10>. Luckily this can be done by other module commands, which
85-
designate C<$> value
86-
as a Nginx variable, then the variable can be used in L<ngx_echo/echo>,
87-
example:
109+
release C<1.2.7>) and I have been hoping that we could write something like
110+
C<$$> to obtain a literal C<$>.
111+
112+
Luckily, workarounds do exist and here is one proposed by Maxim Dounin: first
113+
we assign to a variable a literal string containing the dollar sign character
114+
via a configuration directive that does I<not> support "variable interpolation"
115+
(remember that not all the directives support "variable interpolation"?), and
116+
then use L<ngx_echo/echo> to print out this variable's value. Here is such an
117+
example to demonstrate the idea:
88118

89119
:nginx
90120
geo $dollar {
@@ -99,30 +129,32 @@ example:
99129
}
100130
}
101131

102-
testing result is following:
132+
Let's test it out:
103133

104134
:bash
105135
$ curl 'http://localhost:8080/test'
106136
This is a dollar sign: $
107137

108-
The built-in module L<ngx_geo> and its command L<ngx_geo/geo> are used
109-
to initialize
110-
variable C<$dollar> with string C<"$">, thereafter variable C<$dollar>
138+
Here we make use of the L<ngx_geo/geo> directive of the standard module
139+
L<ngx_geo> to initialize the
140+
C<$dollar> variable with the string C<"$">, thereafter variable C<$dollar>
111141
can be used
112-
for circumstances asking for a dollar sign. Actually, the typical scenario
113-
L<ngx_geo>
114-
is applied for, is to assign Nginx variable by taking into account the
115-
request client
116-
IP addresses. For above specific example, it is used to initialize C<$dollar>
142+
wherever we need a literal dollar sign. This works because the L<ngx_geo/geo>
143+
directive does not
144+
support "variable interpolation" at all. However, the L<ngx_geo> module
145+
is designed to set a Nginx variable to different values according to the
146+
remote client
147+
address. In the sample above, we just abuse it to initialize the C<$dollar>
117148
variable
118-
with the dollar sign string unconditionally.
149+
with the string C<"$"> unconditionally.
119150

120-
Attention, "variable interpolation" has a special case, where the variable
121-
name itself
122-
cannot be delimited from the rest of the string (such as it is right in
123-
front of letter,
124-
digit or underscore) Hence a special syntax is needed to handle the case,
125-
as following:
151+
=== Disambiguating Variable Names ===
152+
153+
There is a special case when using "variable interpolation" when the variable
154+
name is followed directly by characters consisting the variable names (like
155+
letters, digits, and underscores).
156+
In such cases we can use a special notation to disambiguate the variable name
157+
from the subsequent literal characters:
126158

127159
:nginx
128160
server {
@@ -134,27 +166,32 @@ as following:
134166
}
135167
}
136168

137-
In the example, variable C<$first> is concatenated with C<world>. If it
138-
is written
139-
directly as C<"$firstworld">, Nginx's variable interpolation tries to devalue
140-
variable
141-
C<$firstworld> instead of C<$first>. To fix this problem, curly bracket
142-
can be used
143-
together with C<$>, such as C<${first}>. Above example has following result:
169+
Here the variable C<$first> is concatenated with the literal string C<world>.
170+
If it
171+
were written
172+
directly as C<"$firstworld">, Nginx's "variable interpolation" engine (also
173+
known as the "script engine") would try to access the variable
174+
C<$firstworld> instead of C<$first>. To resolve the ambiguity, curly brackets
175+
must be used
176+
after the C<$> prefix, as in C<${first}>. Let's test this sample:
144177

145178
:bash
146179
$ curl 'http://localhost:8080/test
147180
hello world
148181

149-
Command L<ngx_rewrite/set> (and Command L<ngx_geo/geo>) not only initialize
150-
a variable,
151-
effectively it firstly declares the variable. Which means, if the variable
152-
is not declared yet,
153-
it is declared automatically (then initialized). In the example, if variable
154-
C<$a> is not declared,
155-
C<set> declares the variable at first hand. If variables are not declared,
156-
Nginx cannot devalue
157-
them, another example:
182+
== Variable Declaration or Creation ==
183+
184+
In languages like C/C++, variables must be declared (or created) before they
185+
can be used so that the compiler can allocate storage and perform type checking
186+
at compile-time. Similarly, Nginx creates all the Nginx variables while loading
187+
the configuration file (or in other words, at "configuration time"), so Nginx
188+
variables are also required to be declared somehow.
189+
190+
Fortunately the L<ngx_rewrite/set> directive and the L<ngx_geo/geo> directive
191+
mentioned above do have the side effect of declaring or creating Nginx
192+
variables that they will assign values to later at "request time". If we do not
193+
declare a variable this way and use it directly in, say, the L<ngx_echo/echo>
194+
directive, we will get an error. For example,
158195

159196
:nginx
160197
? server {
@@ -165,25 +202,26 @@ them, another example:
165202
? }
166203
? }
167204

168-
Nginx aborts loading configuration:
205+
Here we do not declare the C<$foo> variable and access its value directly in
206+
L<ngx_echo/echo>. Nginx will just refuse loading this configuration:
169207

170208
[emerg] unknown "foo" variable
171209

172-
Yes, the server cannot even be started!
210+
Yes, we cannot even start the server!
211+
212+
Nginx variable creation and assignment happen
213+
at completely phases along the timeline.
214+
Variable creation only occurs when Nginx loads its configuration. On the other
215+
hand, variable assignment occurs when requests are actually
216+
being handled. This also means that we can never create new Nginx variables at
217+
"request time".
173218

174-
More importantly, Nginx variable declaration and initialization happens
175-
at different phases in the timeline.
176-
Variable declaration only occurs when Nginx loads its configuration, in
177-
other words, when Nginx is started.
178-
On the other hand, variable initialization occurs when actual request is
179-
being handled. Consequently, server
180-
fails bootstrap if variable is not declared, further more, new Nginx variables
181-
cannot be declared dynamically in
182-
the run time.
219+
== Variable Scope ==
183220

184-
As soon as a variable is declared in Nginx, its scope is the entire configuration,
221+
Once an Nginx variable is created, it is visible to the entire configuration,
185222
regardless of the location
186-
it is referenced, even for different virtual server directives. Here is
223+
it is referenced, even across different virtual server configuration blocks.
224+
Here is
187225
an example:
188226

189227
:nginx
@@ -200,11 +238,13 @@ an example:
200238
}
201239
}
202240

203-
Variable C<$foo> is declared by command C<set> within C<location /bar>,
204-
as variable
205-
visibility is the entire configuration. It can be referenced in C<location
241+
Here the variable C<$foo> is created by the L<ngx_rewrite/set> directive within
242+
C<location /bar>,
243+
and this variable is visible to the entire configuration, therefore we can
244+
reference it in C<location
206245
/foo> without
207-
causing any error, following are the location outcomes respectively:
246+
worries. Below is the result of testing these two interfaces via the C<curl>
247+
tool.
208248

209249
:bash
210250
$ curl 'http://localhost:8080/foo'
@@ -216,21 +256,28 @@ causing any error, following are the location outcomes respectively:
216256
$ curl 'http://localhost:8080/foo'
217257
foo = []
218258

219-
As we can tell, command C<set> is executed within C<location /bar>, so
220-
the variable is only initialized when C</bar>
221-
is requested. If C</foo> is requested directly, variable C<$foo> has an
222-
empty value. Default value is an empty string
223-
if Nginx variable is not initialized.
224-
225-
The example carries another important feature, i.e. although variable scope
226-
is the entire configuration, every request
227-
has its own copies of the declared variables. In the example, variable
228-
C<$foo> is initialized with value C<32> when C</bar>
229-
is requested, but it remains empty in the subsequent request to C</foo>
230-
since every request has their own copy of variables
231-
232-
This is a common pitfall many Nginx newbie stumbles, which is to think
233-
Nginx variable as "global variable" or configuration
234-
settings that are shared for the entire server life time. In fact, variables
235-
cannot last in between different requests.
259+
We can see that the assignment operation is only performed in requests that
260+
access C<location /bar>, since the corresponding L<ngx_rewrite/set> directive
261+
is only used in that location. When requesting the C</foo> interface, we always
262+
get an empty value for the C<$foo> variable because that is what we get when
263+
accessing an uninitialized variable.
264+
265+
Another important behavior that we can observe from this example is that even
266+
though the scope of Nginx variables is the entire configuration, each request
267+
does have its own version of all those variables. Or in other words, each
268+
request has its own copy of value containers for all variables. Requests do not
269+
interfere with each other even if they are referencing a variable with the same
270+
name. This is very much like local variables in C/C++ function bodies. Each
271+
invocation of the C/C++ function does use its own version of those local
272+
variables.
273+
274+
For instance, in this sample, we request C</bar> and the variable C<$foo> gets
275+
the value C<32>, which does not affect the value of C<$foo> in subsequent
276+
requests to C</foo> (it is still uninitialized!), because they correspond to
277+
different value containers.
278+
279+
One of the most common mistakes for Nginx newcomers is to regard Nginx
280+
variables as something shared among all the requests. Even though the scope of
281+
Nginx variables go across configuration blocks, it never goes beyond request
282+
boundaries. Essentially here we do have two different kinds of scopes here.
236283

0 commit comments

Comments
 (0)