Domains3 #3109

isaacs · 2012-04-14T04:11:47Z

Submitted for your review.

See the included tests and docs for examples of usage. Most of the time, the user doesn't have to do anything special to take advantage of them: all callbacks that are assigned as part of a ReqWrap or timer are automatically attached, as is anything called by EventEmitter.prototype.emit.

To enable domains, the domain module must be loaded at least once. require('domain') sets the flag on the events module to tell it to register new EventEmitters with the active domain. All the other behavior is based on the presence of an active domain at process.domain.

A domain is made "active" by being entered. If a timer or EventEmitter is explicitly added to a domain (or implicitly added by being created while a domain is active), then that domain will be activated for the callback resulting from that timer or EventEmitter. Same for ReqWraps, however since they are not exposed to the user, there is no way to explicitly add them; they can only be implicitly bound to a domain.

The implementation works by hooking into the process.on('uncaughtException') event. This has the advantage of catching situations that may escape the current call stack, so is a bit more reliable than wrapping callbacks in a try/catch. However, it still depends on using MakeCallback as the singe entry point from C++ to JavaScript, where we use a TryCatch in C++ to route errors to the FatalException exit point.

Performance

There is a slight performance increase owing to the MakeCallback refactoring, but it's not very significant. The cleanup I did was rather naive, and further performance benefits can perhaps be attained by a more thorough cleanup.

Running a http_simple benchmark against /bytes/1024 shows that they're in roughly the same ballpark. These are pretty typical results, and no significant degradation can be detected.

Note that the http_simple.js benchmark in this branch actually creates a domain and adds both the request and response objects to it explicitly, so it is a meaningful real-world use of domains. Of course, if errors are encountered, then this causes a bit of a performance hit, but otherwise, any impact is less than ambient noise.

On master (that is, 93cefab)

$ i=0; while [ $i -lt 10 ] ; do echo -n "$i "; TYPE=bytes LENGTH=1024 NODE_USE_DOMAINS=1 bash benchmark/http.sh 2>&1 | grep Requests ; let "i++" ; done
0 Requests per second:    3403.72 [#/sec] (mean)
1 Requests per second:    2886.88 [#/sec] (mean)
2 Requests per second:    2879.11 [#/sec] (mean)
3 Requests per second:    2918.01 [#/sec] (mean)
4 Requests per second:    3053.11 [#/sec] (mean)
5 Requests per second:    2821.80 [#/sec] (mean)
6 Requests per second:    2883.54 [#/sec] (mean)
7 Requests per second:    2734.61 [#/sec] (mean)
8 Requests per second:    2914.67 [#/sec] (mean)
9 Requests per second:    2813.31 [#/sec] (mean)

After the MakeCallback refactoring, but before the actual domain work:

[gh:node/pre-domain][email protected]:~/dev/js/node-master v0.7.8-pre 4.1.0(1)-release
$ i=0; while [ $i -lt 10 ] ; do echo -n "$i "; TYPE=bytes LENGTH=1024 NODE_USE_DOMAINS=1 bash benchmark/http.sh 2>&1 | grep Requests ; let "i++" ; done
0 Requests per second:    3957.63 [#/sec] (mean)
1 Requests per second:    3145.52 [#/sec] (mean)
2 Requests per second:    3086.34 [#/sec] (mean)
3 Requests per second:    2987.89 [#/sec] (mean)
4 Requests per second:    2718.45 [#/sec] (mean)
5 Requests per second:    3069.57 [#/sec] (mean)
6 Requests per second:    2838.25 [#/sec] (mean)
7 Requests per second:    2851.08 [#/sec] (mean)
8 Requests per second:    2842.76 [#/sec] (mean)
9 Requests per second:    3137.16 [#/sec] (mean)

This branch:

$ i=0; while [ $i -lt 10 ] ; do echo -n "$i "; TYPE=bytes LENGTH=1024 NODE_USE_DOMAINS=1 bash benchmark/http.sh 2>&1 | grep Requests ; let "i++" ; done
0 Requests per second:    3546.52 [#/sec] (mean)
1 Requests per second:    2911.57 [#/sec] (mean)
2 Requests per second:    2925.34 [#/sec] (mean)
3 Requests per second:    3010.00 [#/sec] (mean)
4 Requests per second:    2982.31 [#/sec] (mean)
5 Requests per second:    2915.03 [#/sec] (mean)
6 Requests per second:    2951.25 [#/sec] (mean)
7 Requests per second:    2912.61 [#/sec] (mean)
8 Requests per second:    2914.82 [#/sec] (mean)
9 Requests per second:    2755.61 [#/sec] (mean)

Shortcomings

The API feels kind of nice, but it's a bit rough still. I think it's fine for v0.8, and we can refine it further in v0.9 and beyond once users have a chance to play with it. We should not bikeshed it too much at this point. That being said, the method names are sort of weird, and some other things can probably be improved, so if you have any ideas, please share them.
The domain.dispose() mechanism is just kludgey. We really need a single way to forcibly shut down handles, requests, and streams, so that we don't leave timers and fd's dangling. Also on the subject of debuggability, it would be nice to ensure that we don't close() a file descriptor while there are still outstanding fs operations on it. This consistency/cleanup can wait, however, and we can make v0.9 all about API polishing.
The domain only knows about the items that are explicitly added to it. It's too expensive to push every EventEmitter onto the active domain's member list. (It's also extremely memory intensive, though WeakMaps could perhaps be a solution.) So, when you call domain.dispose(), it disposes the domain itself (ie, renders it a no-op) but it can only .destroy() and clearTimeout the items that have been explicitly added by calling domain.add(emitter), not those that are implicitly bound. It would be nice to find some clever way to have the domain cheaply know which items were created during its reign, but I didn't find one.

koichik · 2012-04-14T08:04:11Z

example:

var fs = require('fs');
var domain = require('domain');

var d = domain.create();
d.on('error', function(err) {
  console.log('ERROR', err);
});

d.run(function() {
  foo(__filename, d.bind(function(err, data) {
    console.log(data);
  }));
});

function foo(path, cb) {
  // A) here is a part of domain
  fs.readFile(path, function(err, data) {
    // B) here is not a part of domain
    if (err) throw err;
    cb(null, data.toString());
  });
}

Please imagine that foo is a function from an existing library. When an exception is thrown form A), it is routed to domain's error listener. But from B), it is not routed. I think that domains by this proposal cannot cooperate with a third-party modules or an existing libraries... I read "Shortcomings", but I still think that it is crucial to propagate implicit domain's context.

isaacs · 2012-04-14T15:57:49Z

@koichik No, in the B) section of your example, it would still be part of the domain. ReqWraps are bound to the domain that is active at the time of their creation, and so ReqWrap callbacks are run in the context of the domain.

https://gist.github.com/2385382

isaacs · 2012-04-14T16:04:50Z

Also, fs.readFile uses an EventEmitter internally, so it would also be implicitly bound to the active domain at the time of its creation.

koichik · 2012-04-14T16:35:58Z

@isaacs - Sorry, this is my mistake. I cannot reproduce what I wrote in the previous comment.

koichik · 2012-04-14T17:21:09Z

I understood my failure. First time, I used process.nextTick() instead of fs.readFile().

function foo(path, cb) {
  // here is a part of domain
  process.nextTick(function() {
    // here is not a part of domain
    throw new Error(); // uncaught by domain
  });
}

After changing process.nextTick to fs.readFile(), I made the mistake in the check of a log.

isaacs · 2012-04-14T18:28:42Z

Ah! Yes! It seems that nextTick is a hole. I'll look into that. Nice find!

isaacs · 2012-04-14T18:33:01Z

Er, wait, no, it isn't. I spoke too soon. nextTick still goes through the same entry point. Updated the gist: https://gist.github.com/2385382

isaacs · 2012-04-14T18:56:08Z

No, I didn't speak too soon :) It's only working by accident. There is indeed a possibility for escape here.

I'll fix it soon.

koichik · 2012-04-14T19:00:14Z

@isaacs - Confirmed, the gist works well. However, my code does not work.

var domain = require('domain');

var d = domain.create();
d.on('error', function(err) {
  console.log('Domain caught error', err);
});

d.run(function() {
  foo();
});

function foo() { 
  // here is a part of domain
  process.nextTick(function() {
    // here is not a part of domain
    throw new Error('nexttick in foo');
  });
}

result:

$ ~/git/isaacs/node/node d.js

domain.js:66
    throw er;
          ^
Error: nexttick in foo
    at Array.0 (/tmp/d.js:17:11)
    at EventEmitter._tickCallback (node.js:238:41)

In this case, there is no active domain within process.nextTick().

koichik · 2012-04-14T19:02:25Z

I was too late :)

isaacs · 2012-04-14T19:21:35Z

This would close the leak:

diff --git a/src/node.js b/src/node.js
index 3322df6..d0ea1b4 100644
--- a/src/node.js
+++ b/src/node.js
@@ -249,6 +249,7 @@
     };

     process.nextTick = function(callback) {
+      if (process.domain) callback = process.domain.bind(callback);
       nextTickQueue.push(callback);
       process._needTickCallback();
     };

However, the cleaner approach would be to push objects rather than functions into the tick queue, and assign the active domain to them.

Also, it would be nice if the MakeCallback function could check the _disposed member of the relevant domain, and not fire the callback at all in that case.

isaacs · 2012-04-14T19:37:12Z

c9c44e0 fixes the issue that @koichik found.

7232710008bdfabd17b6328e755dcf6d4b745929 seems somewhat controversial to me. It prevents callbacks from firing at all if they are bound to a domain that has been disposed. I could see this leading in some rare cases to memory leaks, since there may be IO that needs to be cleaned up. On the other hand, it's the responsibility of the user to clean up their stuff when they call domain.dispose().

isaacs · 2012-04-14T19:40:16Z

Er, there's a bug in c9c44e0. Fixed by rebasing to ab886be and 054740e.

shigeki · 2012-04-15T14:43:36Z

It seems that setTimeout() causes the leak of domain.members array because the object of timers in domain.members are not removed after ontimeout.
The bug example shows as below in which setTimeout() is called to check Date field cache.
I have not read implementation of Domains3 in detail yet but this quick fix 82d28f3 seems working fine. Please review it.

// test-domain-leak.js
var http = require('http');
var domain = require('domain');
var d = domain.create();
var port = 8000;
d.on('error', function(err) {
  console.log('domain.members.length:', err.domain.members.length);
});
var server = http.createServer(d.bind(function(req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end("Hello");
  throw new Error('error');
})).listen(port, function() {
  setInterval(function() {
    var options = {
      host: 'localhost',
      port: port,
      path: '/'
    };
    http.get(options);
  }, 1000);
});

> ./node test-domain-leak.js
domain.members.length: 1
domain.members.length: 2
domain.members.length: 3
domain.members.length: 4

isaacs · 2012-04-15T21:07:08Z

Good catch, @shigeki. Fixed more simply on 3dbc73f. Timers shouldn't be explicitly bound anyway, only implicitly.

shigeki · 2012-04-16T01:37:35Z

@isaacs Confirmed that setTimeout was fixed. Thanks. But a callback of setInterval() is still out of domain.

// test-setinterval.js
var domain = require('domain');
var d = domain.create();
d.on('error', function(err) {
  console.log(err);
});
function foo(msg) {
  var timer = setInterval(function() {
    clearTimeout(timer);
    throw new Error('error test');
  }, 100);
}
d.run(foo);

unixjp:~/tmp/isaacs/node> ./node test-setinterval.js

domain.js:66
    throw er;
          ^
Error: error test
    at Timer.<anonymous> (/home/ohtsu/tmp/isaacs/node/test-setinterval.js:10:11)
    at Timer.<anonymous> (timers.js:232:14)

Fixed in 317f990 as below. Please review it.

unixjp:~/tmp/isaacs/node> ./node test-setinterval.js
{ [Error: error test]
  domain: { members: [], _events: { error: [Function] } },
  domain_thrown: true }

isaacs · 2012-04-16T02:08:51Z

@shigeki Indeed. I fixed this on c2e4a5dc7530f77a16cfe97793c41e64770630bf by making the key always "domain" rather than "_domain", so that MakeCallback picks it up properly.

shigeki · 2012-04-16T02:20:41Z

That's much more elegant. Confirmed that timers works good.

bnoordhuis · 2012-04-16T19:46:08Z

lib/domain.js

Is swallowing errors a good idea here? Shouldn't they be logged or emitted to somewhere somehow?

Well, they'd always be errors like that you can't .end() a stream that's already closed or something. Ie, useless and extraneous. Until we have a consistent way to shut things down, and to tell if they've been shut down, I think this is about the best we can do.

This is a squashed commit of the main work done on the domains-wip branch. The original commit messages are preserved for posterity: * Implicitly add EventEmitters to active domain * Implicitly add timers to active domain * domain: add members, remove ctor cb * Don't hijack bound callbacks for Domain error events * Add dispose method * Add domain.remove(ee) method * A test of multiple domains in process at once * Put the active domain on the process object * Only intercept error arg if explicitly requested * Typo * Don't auto-add new domains to the current domain While an automatic parent/child relationship is sort of neat, and leads to some nice error-bubbling characteristics, it also results in keeping a reference to every EE and timer created, unless domains are explicitly disposed of. * Explicitly adding one domain to another is still fine, of course. * Don't allow circular domain->domain memberships * Disposing of a domain removes it from its parent * Domain disposal turns functions into no-ops * More documentation of domains * More thorough dispose() semantics * An example using domains in an HTTP server * Don't handle errors on a disposed domain * Need to push, even if the same domain is entered multiple times * Array.push is too slow for the EE Ctor * lint domain * domain: docs * Also call abort and destroySoon to clean up event emitters * domain: Wrap destroy methods in a try/catch * Attach tick callbacks to active domain * domain: Only implicitly bind timers, not explicitly * domain: Don't fire timers when disposed. * domain: Simplify naming so that MakeCallback works on Timers * Add setInterval and nextTick to domain test * domain: Make stack private

bnoordhuis reviewed Apr 16, 2012
View reviewed changes

Report errors thrown from uncaughtException handers

80a55e9

isaacs added 12 commits April 17, 2012 11:56

MakeCallback: Accept Function or Symbol argument

7407be8

MakeCallback: Return the callback return value

ac1aadd

MakeCallback: abort() if not a function

88f94fa

MakeCallback: Use in node_file.cc

91701c2

MakeCallback: Use in node_crypto

35c0cd2

MakeCallback: Use in node_io_watcher

e1dd570

MakeCallback: Use in node_signal_watcher

db45b2c

MakeCallback: Consistent symbol usage

a26bee8

Domain hooks in ReqWrap<T> and MakeCallback

10ce3d1

Add switches to http_simple bench to use domains

45c1d4f

Add Todo comments about domain-ifying crypto

d4ed2e6

isaacs merged commit d4ed2e6 into nodejs:master Apr 17, 2012

Uh oh!

Domains3 #3109

Domains3 #3109

Uh oh!

Conversation

isaacs commented Apr 14, 2012

Performance

Shortcomings

Uh oh!

koichik commented Apr 14, 2012

Uh oh!

isaacs commented Apr 14, 2012

Uh oh!

isaacs commented Apr 14, 2012

Uh oh!

koichik commented Apr 14, 2012

Uh oh!

koichik commented Apr 14, 2012

Uh oh!

isaacs commented Apr 14, 2012

Uh oh!

isaacs commented Apr 14, 2012

Uh oh!

isaacs commented Apr 14, 2012

Uh oh!

koichik commented Apr 14, 2012

Uh oh!

koichik commented Apr 14, 2012

Uh oh!

isaacs commented Apr 14, 2012

Uh oh!

isaacs commented Apr 14, 2012

Uh oh!

isaacs commented Apr 14, 2012

Uh oh!

shigeki commented Apr 15, 2012

Uh oh!

isaacs commented Apr 15, 2012

Uh oh!

shigeki commented Apr 16, 2012

Uh oh!

isaacs commented Apr 16, 2012

Uh oh!

shigeki commented Apr 16, 2012

Uh oh!

bnoordhuis Apr 16, 2012

Choose a reason for hiding this comment

Uh oh!

isaacs Apr 16, 2012

Choose a reason for hiding this comment

Uh oh!

Uh oh!