11= Nginx Variables (01) =
22
3- Nginx's configuration is itself a mini language. Many Nginx configurations
4- are practically programs.
5- The language might not be Turing-Complete, as far as I can see, its design
3+ == String Container ==
4+
5+ Nginx's configuration files use a micro programming language. Many real-world
6+ Nginx configuration files are essentially small programs.
7+ This language's design
68is heavily influenced by
7- Perl and Bourne Shell. This is a characteristic feature of Nginx, comparing
9+ Perl and Bourne Shell as far as I can see, despite the fact that it might not
10+ be Turing-Complete. This is a distinguishing feature of Nginx, as compared
811to the other web servers
9- such as Apache or Lighttpd. Being a language, "Variable" declaration becomes
10- a common concept (However,
11- exception does exist in Functional Languages such as Haskell)
12+ like Apache or Lighttpd. Being a programming language, "variables" are
13+ thus a natural part of it (exceptions do exist, of course, as in pure
14+ functional languages like Haskell).
15+
16+ Variables are just containers holding various values in imperative languages
17+ like Perl, Bourne Shell, and C/C++.
18+ And "values" here can be numbers like C<3.14>, strings like
19+ C<hello world>, or even complicated things like references to arrays or
20+ hash tables. For the
21+ Nginx configuration language, however, variables can only hold one single type
22+ of values, that is, strings.
1223
13- For those who know well imperative languages like Perl, Bourne Shell, C/C++,
14- variable is nothing but
15- a container holding various values, and the "value" can be numbers like
16- C<3.14> or strings like
17- C<hello world>. Values can be as complicated as references to arrays or
18- hash tables too. However in the
19- Nginx configuration, variable contains one and only one type of value:
20- strings.
24+ == Variable Syntax and Interpolation ==
2125
22- For example, our F<nginx.conf> has following variable declaration:
26+ Let's say our F<nginx.conf> configuration file has the following configuration
27+ line:
2328
2429 :nginx
2530 set $a "hello world";
2631
27- We have used built-in L<ngx_rewrite> module's L<ngx_rewrite/set> command
28- to declare and initialize
29- the variable C<$a>. Specifically, it is assigned with strings C<hello world>.
30- Like Perl and PHP, the
31- Nginx syntax requires prefix C<$> to declare and devalue variables.
32+ where we assign a value to the variable C<$a> via the L<ngx_rewrite/set>
33+ configuration directive coming from the standard L<ngx_rewrite> module. In
34+ particular, we assign the string value C<hello world> to it.
3235
33- Many C<Java> and C<C#> programmers dislike the ugly C<$> variable prefix,
34- yet the approach does have
35- a few advantages, notably, variables can be embedded directly in a string
36- to construct another string
36+ We can see that the Nginx variable name takes a dollar sign (C<$>) in front of
37+ it. This is required by the language syntax: whenever we want to reference an
38+ Nginx variable in the configuration file, we must add a C<$> prefix. This look
39+ very familiar to those Perl and PHP programmers.
40+
41+ Such variable prefix modifiers may discomfort some C<Java> and C<C#>
42+ programmers, this notation does have an
43+ obvious advantage though, that is, variables can be embedded directly into a
44+ string literal:
3745
3846 :nginx
3947 set $a hello;
4048 set $b "$a, $a";
4149
42- It is using Nginx variable C<$a>, to construct variable C<$b>. Now C<$a>
43- is C<hello>, and C<$b> is
44- C<hello, hello>. The technique is called "variable interpolation" in Perl.
45- It effectively executes
46- the string concatenation.
50+ Here we use the value of the existing Nginx variable C<$a> to construct the
51+ value for the variable C<$b>. So after these two directives complete execution,
52+ the value of C<$a> is C<hello>, and C<$b> C<hello, hello>. This technique is
53+ called "variable interpolation" in the Perl world, which makes ad-hoc string
54+ concatenation operators no longer that necessary. Let's use the same term for
55+ the Nginx world from now on.
4756
48- Let's have a look at another example:
57+ Let's see another complete example:
4958
5059 :nginx
5160 server {
@@ -57,34 +66,55 @@ Let's have a look at another example:
5766 }
5867 }
5968
60- The example omits the outter C<http> directive and C<events> directive
61- in F<nginx.conf>. With
62- the HTTP client utility C<curl>, we can issue a HTTP request to C</test>
63- from command line and
64- obtain following result:
69+ This example omits the C<http> directive and C<events> configuration blocks in
70+ the outer-most scope for brevity. To request this C</test> interface via
71+ C<curl>, an HTTP client utility, on the command line, we get
6572
6673 :bash
6774 $ curl 'http://localhost:8080/test'
6875 foo: hello
6976
70- Here we use 3rd party module L<ngx_echo> and its command L<ngx_echo/echo>
71- to print the value
72- of variable C<$foo> as HTTP response.
77+ Here we use the L<ngx_echo/echo> directive of the 3rd party module L<ngx_echo>
78+ to print out the value of the C<$foo> variable as the HTTP response.
7379
74- We can assert that L<ngx_echo/echo> supports "variable interpolation",
75- yet we must not take it
76- for granted, since not all the variable commands supports "variable interpolation"
80+ Apparently the arguments of the L<ngx_echo/echo> directive does support
81+ "variable interpolation", but we
82+ can not take it
83+ for granted for other directives. Because not all the configuration directives
84+ support "variable interpolation"
7785and it is
78- in fact up to the module's implementation.
86+ in fact up to the implementation of the directive in that module. Always look
87+ up the documentation to be sure.
88+
89+ === Escaping "$" ===
90+
91+ We've already learned that the C<$> character is special and it serves as the
92+ variable name prefix, but now consider that we want to output a literal C<$>
93+ character via the L<ngx_echo/echo> directive. The following naive example does
94+ not work at all:
95+
96+ ? :nginx
97+ ? location /t {
98+ ? echo "$";
99+ ? }
100+
101+ we will get the following error message while loading this configuration:
79102
80- Is there any way to escape C<$> so that it is no more than a typical dollar
81- sign by using
82- L<ngx_echo/echo> ? The answer is negative (the answer still holds in the
103+ [emerg] invalid variable name in ...
104+
105+ Obviously Nginx is try to parse C<$"> as a variable name. Is there a way to
106+ escape C<$> in the string literal? The answer is "no" (it is still the case in
107+ the
83108latest Nginx stable
84- release C<1.0.10>. Luckily this can be done by other module commands, which
85- designate C<$> value
86- as a Nginx variable, then the variable can be used in L<ngx_echo/echo>,
87- example:
109+ release C<1.2.7>) and I have been hoping that we could write something like
110+ C<$$> to obtain a literal C<$>.
111+
112+ Luckily, workarounds do exist and here is one proposed by Maxim Dounin: first
113+ we assign to a variable a literal string containing the dollar sign character
114+ via a configuration directive that does I<not> support "variable interpolation"
115+ (remember that not all the directives support "variable interpolation"?), and
116+ then use L<ngx_echo/echo> to print out this variable's value. Here is such an
117+ example to demonstrate the idea:
88118
89119 :nginx
90120 geo $dollar {
@@ -99,30 +129,32 @@ example:
99129 }
100130 }
101131
102- testing result is following :
132+ Let's test it out :
103133
104134 :bash
105135 $ curl 'http://localhost:8080/test'
106136 This is a dollar sign: $
107137
108- The built-in module L<ngx_geo> and its command L<ngx_geo/geo> are used
109- to initialize
110- variable C<$dollar> with string C<"$">, thereafter variable C<$dollar>
138+ Here we make use of the L<ngx_geo/geo> directive of the standard module
139+ L<ngx_geo> to initialize the
140+ C<$dollar> variable with the string C<"$">, thereafter variable C<$dollar>
111141can be used
112- for circumstances asking for a dollar sign. Actually, the typical scenario
113- L<ngx_geo>
114- is applied for, is to assign Nginx variable by taking into account the
115- request client
116- IP addresses. For above specific example, it is used to initialize C<$dollar>
142+ wherever we need a literal dollar sign. This works because the L<ngx_geo/geo>
143+ directive does not
144+ support "variable interpolation" at all. However, the L<ngx_geo> module
145+ is designed to set a Nginx variable to different values according to the
146+ remote client
147+ address. In the sample above, we just abuse it to initialize the C<$dollar>
117148variable
118- with the dollar sign string unconditionally.
149+ with the string C<"$"> unconditionally.
119150
120- Attention, "variable interpolation" has a special case, where the variable
121- name itself
122- cannot be delimited from the rest of the string (such as it is right in
123- front of letter,
124- digit or underscore) Hence a special syntax is needed to handle the case,
125- as following:
151+ === Disambiguating Variable Names ===
152+
153+ There is a special case when using "variable interpolation" when the variable
154+ name is followed directly by characters consisting the variable names (like
155+ letters, digits, and underscores).
156+ In such cases we can use a special notation to disambiguate the variable name
157+ from the subsequent literal characters:
126158
127159 :nginx
128160 server {
@@ -134,27 +166,32 @@ as following:
134166 }
135167 }
136168
137- In the example, variable C<$first> is concatenated with C<world>. If it
138- is written
139- directly as C<"$firstworld">, Nginx's variable interpolation tries to devalue
140- variable
141- C<$firstworld> instead of C<$first>. To fix this problem, curly bracket
142- can be used
143- together with C<$>, such as C<${first}>. Above example has following result:
169+ Here the variable C<$first> is concatenated with the literal string C<world>.
170+ If it
171+ were written
172+ directly as C<"$firstworld">, Nginx's "variable interpolation" engine (also
173+ known as the "script engine") would try to access the variable
174+ C<$firstworld> instead of C<$first>. To resolve the ambiguity, curly brackets
175+ must be used
176+ after the C<$> prefix, as in C<${first}>. Let's test this sample:
144177
145178 :bash
146179 $ curl 'http://localhost:8080/test
147180 hello world
148181
149- Command L<ngx_rewrite/set> (and Command L<ngx_geo/geo>) not only initialize
150- a variable,
151- effectively it firstly declares the variable. Which means, if the variable
152- is not declared yet,
153- it is declared automatically (then initialized). In the example, if variable
154- C<$a> is not declared,
155- C<set> declares the variable at first hand. If variables are not declared,
156- Nginx cannot devalue
157- them, another example:
182+ == Variable Declaration or Creation ==
183+
184+ In languages like C/C++, variables must be declared (or created) before they
185+ can be used so that the compiler can allocate storage and perform type checking
186+ at compile-time. Similarly, Nginx creates all the Nginx variables while loading
187+ the configuration file (or in other words, at "configuration time"), so Nginx
188+ variables are also required to be declared somehow.
189+
190+ Fortunately the L<ngx_rewrite/set> directive and the L<ngx_geo/geo> directive
191+ mentioned above do have the side effect of declaring or creating Nginx
192+ variables that they will assign values to later at "request time". If we do not
193+ declare a variable this way and use it directly in, say, the L<ngx_echo/echo>
194+ directive, we will get an error. For example,
158195
159196 :nginx
160197 ? server {
@@ -165,25 +202,26 @@ them, another example:
165202 ? }
166203 ? }
167204
168- Nginx aborts loading configuration:
205+ Here we do not declare the C<$foo> variable and access its value directly in
206+ L<ngx_echo/echo>. Nginx will just refuse loading this configuration:
169207
170208 [emerg] unknown "foo" variable
171209
172- Yes, the server cannot even be started!
210+ Yes, we cannot even start the server!
211+
212+ Nginx variable creation and assignment happen
213+ at completely phases along the timeline.
214+ Variable creation only occurs when Nginx loads its configuration. On the other
215+ hand, variable assignment occurs when requests are actually
216+ being handled. This also means that we can never create new Nginx variables at
217+ "request time".
173218
174- More importantly, Nginx variable declaration and initialization happens
175- at different phases in the timeline.
176- Variable declaration only occurs when Nginx loads its configuration, in
177- other words, when Nginx is started.
178- On the other hand, variable initialization occurs when actual request is
179- being handled. Consequently, server
180- fails bootstrap if variable is not declared, further more, new Nginx variables
181- cannot be declared dynamically in
182- the run time.
219+ == Variable Scope ==
183220
184- As soon as a variable is declared in Nginx, its scope is the entire configuration,
221+ Once an Nginx variable is created, it is visible to the entire configuration,
185222regardless of the location
186- it is referenced, even for different virtual server directives. Here is
223+ it is referenced, even across different virtual server configuration blocks.
224+ Here is
187225an example:
188226
189227 :nginx
@@ -200,11 +238,13 @@ an example:
200238 }
201239 }
202240
203- Variable C<$foo> is declared by command C<set> within C<location /bar>,
204- as variable
205- visibility is the entire configuration. It can be referenced in C<location
241+ Here the variable C<$foo> is created by the L<ngx_rewrite/set> directive within
242+ C<location /bar>,
243+ and this variable is visible to the entire configuration, therefore we can
244+ reference it in C<location
206245/foo> without
207- causing any error, following are the location outcomes respectively:
246+ worries. Below is the result of testing these two interfaces via the C<curl>
247+ tool.
208248
209249 :bash
210250 $ curl 'http://localhost:8080/foo'
@@ -216,21 +256,28 @@ causing any error, following are the location outcomes respectively:
216256 $ curl 'http://localhost:8080/foo'
217257 foo = []
218258
219- As we can tell, command C<set> is executed within C<location /bar>, so
220- the variable is only initialized when C</bar>
221- is requested. If C</foo> is requested directly, variable C<$foo> has an
222- empty value. Default value is an empty string
223- if Nginx variable is not initialized.
224-
225- The example carries another important feature, i.e. although variable scope
226- is the entire configuration, every request
227- has its own copies of the declared variables. In the example, variable
228- C<$foo> is initialized with value C<32> when C</bar>
229- is requested, but it remains empty in the subsequent request to C</foo>
230- since every request has their own copy of variables
231-
232- This is a common pitfall many Nginx newbie stumbles, which is to think
233- Nginx variable as "global variable" or configuration
234- settings that are shared for the entire server life time. In fact, variables
235- cannot last in between different requests.
259+ We can see that the assignment operation is only performed in requests that
260+ access C<location /bar>, since the corresponding L<ngx_rewrite/set> directive
261+ is only used in that location. When requesting the C</foo> interface, we always
262+ get an empty value for the C<$foo> variable because that is what we get when
263+ accessing an uninitialized variable.
264+
265+ Another important behavior that we can observe from this example is that even
266+ though the scope of Nginx variables is the entire configuration, each request
267+ does have its own version of all those variables. Or in other words, each
268+ request has its own copy of value containers for all variables. Requests do not
269+ interfere with each other even if they are referencing a variable with the same
270+ name. This is very much like local variables in C/C++ function bodies. Each
271+ invocation of the C/C++ function does use its own version of those local
272+ variables.
273+
274+ For instance, in this sample, we request C</bar> and the variable C<$foo> gets
275+ the value C<32>, which does not affect the value of C<$foo> in subsequent
276+ requests to C</foo> (it is still uninitialized!), because they correspond to
277+ different value containers.
278+
279+ One of the most common mistakes for Nginx newcomers is to regard Nginx
280+ variables as something shared among all the requests. Even though the scope of
281+ Nginx variables go across configuration blocks, it never goes beyond request
282+ boundaries. Essentially here we do have two different kinds of scopes here.
236283
0 commit comments