11= Nginx Variables (07) =
22
3- We have learnt in L<vartut/ (01)>, that Nginx variables could only
4- be strings. We are in fact, not entirely correct because variables
5- can have non-values. There are 2 kinds of non-values in Nginx, one
6- is "invalid" value, another is "not found" value.
7-
8- For example, if Nginx variable C<$foo> is declared but not initialized
9- it has an "invalid" value. Whereas if there exists no C<XXX> parameter
10- in the current request URL, builtin variable C<$arg_XXX> has a "not found"
11- value.
12-
13- Nginx special values, such as "invalid" value or "not found" value, are
14- totally different from empty string (""). Like C<undefined> and C<null>
15- found in JavaScript, or C<nil> found in Lua, these non-values are not
16- numerical value C<0>, they are not boolean value C<false> either. In fact
17- the C<NULL> found in SQL is an equivalent element.
18-
19- Although back in L<vartut/ (01)>, the uninitialized value becomes empty
20- string in "variable interpolation" via command L<ngx_rewrite/set>. This
21- is
22- because L<ngx_rewrite/set> hooks a "get handler" for the variable it declares,
23- and the handler turns "invalid" value into empty string. Let's review the
24- example in L<vartut/ (01)> for this assertion:
3+ == Special Value "Invalid" and "Not Found" ==
4+
5+ We have mentioned that the values of Nginx variables can only be of one single
6+ type, that is, the string type, but variables could also have no meaningful
7+ values
8+ at all. Variables without any meaningful values still take a special value
9+ though.
10+ There are two possible special values: "invalid" and "not found".
11+
12+ For example, when a user variable C<$foo> is created but not assigned yet,
13+ C<$foo> takes the special value of "invalid". And when the current URL
14+ query string does not have the C<XXX> argument at all, the built-in variable
15+ L<$arg_XXX> takes the special value of "not found".
16+
17+ Both "invalid" and "not found" are special values, completely different from an
18+ empty string value (C<"">). This is very similar to those distinct special
19+ values in some dynamic programing languages, like C<undef> in Perl, C<nil> in
20+ Lua, and C<null>
21+ in JavaScript.
22+
23+ We have seen earlier that an uninitialized variable is evaluated to an
24+ empty
25+ string when used in an interpolated string, its real value, however, is not an
26+ empty
27+ string at all. It is the "get handler" registered by the L<ngx_rewrite/set>
28+ directive that automatically converts the "invalid" special value into an empty
29+ string. To verify this, let's return to the example we have discussed before:
2530
2631 :nginx
2732 location /foo {
@@ -33,93 +38,80 @@ example in L<vartut/ (01)> for this assertion:
3338 echo "foo = [$foo]";
3439 }
3540
36- Again to make it clearer, the C<server> directive is omitted. In this example
37- command L<ngx_rewrite/set> implicitly declares variable C<$foo> within
38- C<location /bar>
39- Then we print the uninitialized C<$foo> within C<location /foo> by using
40- command
41- L<ngx_echo/echo>. The result is following when C<location /foo> was requested:
41+ When accessing C</foo>, the user variable C<$foo> is uninitialized when used in
42+ the interpolated string for the L<ngx_echo/echo> directive. The output shows
43+ that the variable is evaluated to an empty string:
4244
4345 :bash
4446 $ curl 'http://localhost:8080/foo'
4547 foo = []
4648
47- If we look at the output, uninitialized variable C<$foo> is equivalent
48- to an empty
49- string. However if we look further into Nginx error log (usually the file
50- name is F<error.log>)
51- it has a warning message when the request is handled:
49+ From the output, the uninitialized C<$foo> variable behaves just like
50+ taking an empty string value. But careful readers should have already noticed
51+ that, for the request above, there is a warning in the Nginx error log file
52+ (which is F<logs/error.log> by default):
5253
5354 [warn] 5765#0: *1 using uninitialized "foo" variable, ...
5455
55- How is the warning generated ? The answer is the "get handler" hooked to
56- variable C<$foo>
57- when it is declared by command L<ngx_rewrite/set>. By the time command
58- L<ngx_echo/echo>
59- gets executed within C<location /foo>, it needs to evaluate its parameter
60- C<"foo = [$foo]">
61- this is where "variable interpolation" is happening and variable C<$foo>
62- is devalued,
63- Nginx first checks the value container, which has a special "invalid" value,
64- so it decides
65- to execute the variable's "get handler". The handler prints a warning message
66- in Nginx's error
67- log, then returns and caches an empty string as the value of C<$foo>.
68-
69- You might have perceived, this is exactly the same process with which those
70- builtin variable
71- works, when it opt-in a value container as cache. Command L<ngx_rewrite/set>
72- uses the very
73- mechanism to handle those uninitialized Nginx variables. Be careful though,
74- only special value
75- "invalid" will trigger Nginx to execute its "get handler", another special
76- value "no found" won't.
77-
78- The warning message is helpful, as it tells we might have miss spelled
79- variables in Nginx
80- configuration, or we might have used uninitialized variables under an incorrect
81- context. Since
82- cache exists, the warning won't repeat itself for a request life cycle.
83- Besides, the warning
84- can be turned off by module L<ngx_rewrite> and its command L<ngx_rewrite/uninitialized_variable_warn>
85-
86- As we said earlier, builtin variable L<$arg_XXX> has a special value "not
87- found" when
88- the request URL has no C<XXX> parameter. However we cannot as easily distinguish
89- it from
90- an empty string, using Nginx native syntax.
56+ Who on earth generates this warning? The answer is the "get handler" of C<$foo>,
57+ registered by the L<ngx_rewrite/set> directive. When C<$foo> is read, Nginx
58+ first checks the value in its container but sees the "invalid" special value,
59+ then Nginx decides to continue running C<$foo>'s "get handler", which first
60+ prints the warning (as shown above) and then returns an empty string value,
61+ which thereafter gets cached in C<$foo>'s value container.
62+
63+ Careful readers should have identified that this process for user variables is
64+ exactly the same as the mechanism we discussed earlier for built-in variables
65+ involving "get handlers" and result caching in value containers. Yes, it is the
66+ same mechanism in action. It is also worth noting that only the "invalid"
67+ special value will trigger the "get handler" invocation in the Nginx core while
68+ "not found" will not.
69+
70+ The warning message above usually indicates a typo in the variable name or
71+ misuse of uninitialized variables, not necessarily in the context of an
72+ interpolated string. Because of the existence of value caching in the variable
73+ container, this warning will not get printed multiple times in the lifetime of
74+ the current request. Also, the L<ngx_rewrite> module provides the
75+ L<ngx_rewrite/uninitialized_variable_warn> directive for disabling this warning
76+ altogether.
77+
78+ === Testing Special Values of Nginx Variables in Lua ===
79+
80+ As we have just mentioned, the built-in variable L<$arg_XXX> takes the special
81+ value "not found" when the URL argument C<XXX> does not exist, but
82+ unfortunately, it is not easy to distinguish it from the empty string value
83+ directly in the Nginx configuration file, for example:
9184
9285 :nginx
9386 location /test {
9487 echo "name: [$arg_name]";
9588 }
9689
97- We print variable C<$arg_name> meanwhile not to provide C<name> parameter
98- in the request
90+ Here we intentionally omit the URL argument C<name> in our request:
9991
10092 :bash
10193 $ curl 'http://localhost:8080/test'
10294 name: []
10395
104- Special value "not found" cannot be asserted in the output, it looks like
105- an empty string.
106- The "variable interpolation" of Nginx simply ignores "not found" when it
107- is evaluated.
96+ We can see that we are still getting an empty string value, because this time
97+ it is the Nginx "script engine" that automatically converts the "not found"
98+ special value to an empty string when performing variable interpolation.
10899
109- So how do we trace "not found" ? What exactly we can do to distinguish
110- it from an empty
111- string ? Obviously, URL parameter C<name> has an empty string in the request
112- below:
100+ Then how can we test the special value "not found"? Or in other
101+ words, how can we distinguish it from normal empty string values? Obviously, in
102+ the following example, the URL argument C<name> does take an ordinary value,
103+ which is a
104+ true empty string:
113105
114106 :bash
115107 $ curl 'http://localhost:8080/test?name='
116108 name: []
117109
118- We cannot yet tell any differences from the earlier example.
110+ But we cannot really differentiate this from the earlier case that does not
111+ mention the C<name> argument at all.
119112
120- Good news is, with the help of 3rd party module L<ngx_lua>, it can be done
121- in
122- lua code. Now check example below:
113+ Luckily, we can easily achieve this in Lua by means of the 3rd-party module
114+ L<ngx_lua>. Please look at the following example:
123115
124116 :nginx
125117 location /test {
@@ -132,70 +124,73 @@ lua code. Now check example below:
132124 ';
133125 }
134126
135- This configuration is pretty close to the earlier one, except
136- we have used module L<ngx_lua> and its command L<ngx_lua/content_by_lua>,
137- to check Nginx variables and their possible special values using lua code.
138- Specifically, we print C<name: missing> if variable C<$arg_name> has
139- a non-value "not found" or "invalid":
127+ This example is very close to the previous one in terms of functionality.
128+ We use the L<ngx_lua/content_by_lua> directive from the L<ngx_lua> module to
129+ embed a small piece of our own Lua code to test against the special value of
130+ the Nginx variable C<$arg_name>. When C<$arg_name> takes a special value
131+ (either "not found" or "invalid"), we will get the following output when
132+ requesting C</foo>:
140133
141134 :bash
142- curl 'http://localhost:8080/test'
135+ $ curl 'http://localhost:8080/test'
143136 name: missing
144137
145- Let me briefly introduce module L<ngx_lua>, the module embeds lua interpreter
146- (standard or L<LuaJIT|http://luajit.org/luajit.html> in Nginx core, so
147- that
148- lua programs can be executed directly inside Nginx . The lua programs can
149- be
150- written right away in Nginx configuration or be written in external F<.
151- lua>
152- file and loaded via Nginx command referencing the F<.lua> path.
153-
154- Back to our example, Nginx variables are referenced by C<ngx.var> from
155- within
156- lua, it is bridged by module L<ngx_lua>. For example, Nginx variable C<$VARIABLE>
157- can be written as L<ngx_lua/ngx.var.VARIABLE> in lua code. When Nginx variable
158- C<$arg_name> has non-value (special value "invalid" or "not found"), the
159- corresponding
160- variable C<ngx.var. arg_name> is C<nil> in lua. Further more, module L<ngx_lua>
161- provides lua function L<ngx_lua/ ngx.say>, functionally it is equivalent
162- to
163- module L<ngx_echo> and its command L<ngx_echo/echo>.
164-
165- Now if we request with C<name> parameter being an empty string, the output
166- becomes
167- different:
138+ This is our first time meeting the L<ngx_lua> module, which deserves a brief
139+ introduction. This module embeds the Lua language interpreter (or LuaJIT's
140+ Just-in-Time compiler) into the Nginx core, to allow Nginx users directly run
141+ their own Lua programs inside the server . The user can choose to insert
142+ her Lua code into different running phases of the server, to fulfill different
143+ requirements. Such Lua code are either specified directly as literal strings in
144+ the Nginx
145+ configuration file, or reside in external F<.lua> source files (or Lua binary
146+ bytecode
147+ files) whose paths are specified in the Nginx configuration.
148+
149+ Back to our example, we cannot directly write something like C<$arg_name> in
150+ our Lua code. Instead, we reference Nginx variables in Lua by means of the
151+ C<ngx.var> API provided by the L<ngx_lua> module. For example, to reference the
152+ Nginx variable C<$VARIABLE> in Lua, we just write L<ngx_luua/ngx.var.VARIABLE>.
153+ When the Nginx variable C<$ arg_name> takes the special value "not found" (or
154+ "invalid"), C< ngx.var.arg_name> is evaluated to the C<nil> value in the Lua
155+ world. It should also be noting that we use the Lua function L<ngx_lua/ngx.say>
156+ to print out the response body contents, which is functionally equivalent to
157+ the L<ngx_echo/echo> directive we are already very familiar with.
158+
159+ If we provide a C<name> URI argument that takes an empty value in the request,
160+ the output is now very different:
168161
169162 :bash
170163 $ curl 'http://localhost:8080/test?name='
171164 name: []
172165
173- In this case, Nginx variable C<$arg_name> is an empty string, which
174- is neither "not found" nor "invalid", so Lua code prints empty string
175- "" for C<ngx.var.arg_name>. Apparently we have distinguished it from
176- Lua C<nil>
166+ In this test, the value of the Nginx variable C<$arg_name> is a true empty
167+ string, neither "not found" nor "invalid". So in Lua, the expression
168+ C<ngx.var.arg_name> evaluates to the Lua empty string (C<"">), clearly
169+ distinguished from the Lua C<nil> value in the previous test.
177170
178- The distinction becomes significant in a few scenarios. For example,
179- a web service might filter its returns by C<name> by checking if
180- C<name> parameter exists in URL parameters, even if C<name> has an
181- empty string, it still can be used in a filtering operation.
171+ This differentiation is important in certain application scenarios. For
172+ instance, some web services have to decide whether to use a column value to
173+ filter the data set by checking the I<existence> of the corresponding URI
174+ argument. For these serives, when the C<name> URI argument is absent, the
175+ whole data set are just returned; when the C<name> argument takes an empty
176+ value, however, only those records that take an empty value are returned.
182177
183- Admittedly, there are some restrictions with builtin variable L<$arg_XXX>
184- as we can see from our request to C<location /test>:
178+ It is worth mentioning a few limitations in the standard L<$arg_XXX> variable.
179+ Consider using the following request to test C</test> in our previous example
180+ using Lua:
185181
186182 $ curl 'http://localhost:8080/test?name'
187183 name: missing
188184
189- In this case, C<$arg_name> is still computed as "not found" non-value,
190- which
191- is counter common sense. Besides, L<$arg_XXX> only resolutes to the first
192- C<XXX>
193- parameter if there are multiple C<XXX> URL parameters, the rest are discarded:
185+ Now the C<$arg_name> variable still reads the "not found" special value, which
186+ is apparently counter-intuitive. Additionally, when multiple URI arguments with
187+ the same name are specified in the request, L<$arg_XXX> just
188+ returns the first value of the argument, discarding other values silently:
194189
195190 :bash
196191 $ curl 'http://localhost:8080/test?name=Tom&name=Jim&name=Bob'
197192 name: [Tom]
198193
199- To fix these defects, one can use module L<ngx_lua> and its lua function
200- L<ngx_lua/ngx.req.get_uri_args> in lua code .
194+ To solve these problems, we can directly use the Lua function
195+ L<ngx_lua/ngx.req.get_uri_args> provided by the L<ngx_lua> module .
201196
0 commit comments