Skip to content
This repository was archived by the owner on Apr 22, 2023. It is now read-only.

Conversation

isaacs
Copy link

@isaacs isaacs commented Apr 14, 2012

Submitted for your review.

See the included tests and docs for examples of usage. Most of the time, the user doesn't have to do anything special to take advantage of them: all callbacks that are assigned as part of a ReqWrap or timer are automatically attached, as is anything called by EventEmitter.prototype.emit.

To enable domains, the domain module must be loaded at least once. require('domain') sets the flag on the events module to tell it to register new EventEmitters with the active domain. All the other behavior is based on the presence of an active domain at process.domain.

A domain is made "active" by being entered. If a timer or EventEmitter is explicitly added to a domain (or implicitly added by being created while a domain is active), then that domain will be activated for the callback resulting from that timer or EventEmitter. Same for ReqWraps, however since they are not exposed to the user, there is no way to explicitly add them; they can only be implicitly bound to a domain.

The implementation works by hooking into the process.on('uncaughtException') event. This has the advantage of catching situations that may escape the current call stack, so is a bit more reliable than wrapping callbacks in a try/catch. However, it still depends on using MakeCallback as the singe entry point from C++ to JavaScript, where we use a TryCatch in C++ to route errors to the FatalException exit point.

Performance

There is a slight performance increase owing to the MakeCallback refactoring, but it's not very significant. The cleanup I did was rather naive, and further performance benefits can perhaps be attained by a more thorough cleanup.

Running a http_simple benchmark against /bytes/1024 shows that they're in roughly the same ballpark. These are pretty typical results, and no significant degradation can be detected.

Note that the http_simple.js benchmark in this branch actually creates a domain and adds both the request and response objects to it explicitly, so it is a meaningful real-world use of domains. Of course, if errors are encountered, then this causes a bit of a performance hit, but otherwise, any impact is less than ambient noise.

On master (that is, 93cefab)

$ i=0; while [ $i -lt 10 ] ; do echo -n "$i "; TYPE=bytes LENGTH=1024 NODE_USE_DOMAINS=1 bash benchmark/http.sh 2>&1 | grep Requests ; let "i++" ; done
0 Requests per second:    3403.72 [#/sec] (mean)
1 Requests per second:    2886.88 [#/sec] (mean)
2 Requests per second:    2879.11 [#/sec] (mean)
3 Requests per second:    2918.01 [#/sec] (mean)
4 Requests per second:    3053.11 [#/sec] (mean)
5 Requests per second:    2821.80 [#/sec] (mean)
6 Requests per second:    2883.54 [#/sec] (mean)
7 Requests per second:    2734.61 [#/sec] (mean)
8 Requests per second:    2914.67 [#/sec] (mean)
9 Requests per second:    2813.31 [#/sec] (mean)

After the MakeCallback refactoring, but before the actual domain work:

[gh:node/pre-domain][email protected]:~/dev/js/node-master v0.7.8-pre 4.1.0(1)-release
$ i=0; while [ $i -lt 10 ] ; do echo -n "$i "; TYPE=bytes LENGTH=1024 NODE_USE_DOMAINS=1 bash benchmark/http.sh 2>&1 | grep Requests ; let "i++" ; done
0 Requests per second:    3957.63 [#/sec] (mean)
1 Requests per second:    3145.52 [#/sec] (mean)
2 Requests per second:    3086.34 [#/sec] (mean)
3 Requests per second:    2987.89 [#/sec] (mean)
4 Requests per second:    2718.45 [#/sec] (mean)
5 Requests per second:    3069.57 [#/sec] (mean)
6 Requests per second:    2838.25 [#/sec] (mean)
7 Requests per second:    2851.08 [#/sec] (mean)
8 Requests per second:    2842.76 [#/sec] (mean)
9 Requests per second:    3137.16 [#/sec] (mean)

This branch:

$ i=0; while [ $i -lt 10 ] ; do echo -n "$i "; TYPE=bytes LENGTH=1024 NODE_USE_DOMAINS=1 bash benchmark/http.sh 2>&1 | grep Requests ; let "i++" ; done
0 Requests per second:    3546.52 [#/sec] (mean)
1 Requests per second:    2911.57 [#/sec] (mean)
2 Requests per second:    2925.34 [#/sec] (mean)
3 Requests per second:    3010.00 [#/sec] (mean)
4 Requests per second:    2982.31 [#/sec] (mean)
5 Requests per second:    2915.03 [#/sec] (mean)
6 Requests per second:    2951.25 [#/sec] (mean)
7 Requests per second:    2912.61 [#/sec] (mean)
8 Requests per second:    2914.82 [#/sec] (mean)
9 Requests per second:    2755.61 [#/sec] (mean)

Shortcomings

  • The API feels kind of nice, but it's a bit rough still. I think it's fine for v0.8, and we can refine it further in v0.9 and beyond once users have a chance to play with it. We should not bikeshed it too much at this point. That being said, the method names are sort of weird, and some other things can probably be improved, so if you have any ideas, please share them.
  • The domain.dispose() mechanism is just kludgey. We really need a single way to forcibly shut down handles, requests, and streams, so that we don't leave timers and fd's dangling. Also on the subject of debuggability, it would be nice to ensure that we don't close() a file descriptor while there are still outstanding fs operations on it. This consistency/cleanup can wait, however, and we can make v0.9 all about API polishing.
  • The domain only knows about the items that are explicitly added to it. It's too expensive to push every EventEmitter onto the active domain's member list. (It's also extremely memory intensive, though WeakMaps could perhaps be a solution.) So, when you call domain.dispose(), it disposes the domain itself (ie, renders it a no-op) but it can only .destroy() and clearTimeout the items that have been explicitly added by calling domain.add(emitter), not those that are implicitly bound. It would be nice to find some clever way to have the domain cheaply know which items were created during its reign, but I didn't find one.

@koichik
Copy link

koichik commented Apr 14, 2012

example:

var fs = require('fs');
var domain = require('domain');

var d = domain.create();
d.on('error', function(err) {
  console.log('ERROR', err);
});

d.run(function() {
  foo(__filename, d.bind(function(err, data) {
    console.log(data);
  }));
});

function foo(path, cb) {
  // A) here is a part of domain
  fs.readFile(path, function(err, data) {
    // B) here is not a part of domain
    if (err) throw err;
    cb(null, data.toString());
  });
}

Please imagine that foo is a function from an existing library. When an exception is thrown form A), it is routed to domain's error listener. But from B), it is not routed. I think that domains by this proposal cannot cooperate with a third-party modules or an existing libraries... I read "Shortcomings", but I still think that it is crucial to propagate implicit domain's context.

@isaacs
Copy link
Author

isaacs commented Apr 14, 2012

@koichik No, in the B) section of your example, it would still be part of the domain. ReqWraps are bound to the domain that is active at the time of their creation, and so ReqWrap callbacks are run in the context of the domain.

https://gist.github.com/2385382

@isaacs
Copy link
Author

isaacs commented Apr 14, 2012

Also, fs.readFile uses an EventEmitter internally, so it would also be implicitly bound to the active domain at the time of its creation.

@koichik
Copy link

koichik commented Apr 14, 2012

@isaacs - Sorry, this is my mistake. I cannot reproduce what I wrote in the previous comment.

@koichik
Copy link

koichik commented Apr 14, 2012

I understood my failure. First time, I used process.nextTick() instead of fs.readFile().

function foo(path, cb) {
  // here is a part of domain
  process.nextTick(function() {
    // here is not a part of domain
    throw new Error(); // uncaught by domain
  });
}

After changing process.nextTick to fs.readFile(), I made the mistake in the check of a log.

@isaacs
Copy link
Author

isaacs commented Apr 14, 2012

Ah! Yes! It seems that nextTick is a hole. I'll look into that. Nice find!

@isaacs
Copy link
Author

isaacs commented Apr 14, 2012

Er, wait, no, it isn't. I spoke too soon. nextTick still goes through the same entry point. Updated the gist: https://gist.github.com/2385382

@isaacs
Copy link
Author

isaacs commented Apr 14, 2012

No, I didn't speak too soon :) It's only working by accident. There is indeed a possibility for escape here.

I'll fix it soon.

@koichik
Copy link

koichik commented Apr 14, 2012

@isaacs - Confirmed, the gist works well. However, my code does not work.

var domain = require('domain');

var d = domain.create();
d.on('error', function(err) {
  console.log('Domain caught error', err);
});

d.run(function() {
  foo();
});

function foo() { 
  // here is a part of domain
  process.nextTick(function() {
    // here is not a part of domain
    throw new Error('nexttick in foo');
  });
}

result:

$ ~/git/isaacs/node/node d.js

domain.js:66
    throw er;
          ^
Error: nexttick in foo
    at Array.0 (/tmp/d.js:17:11)
    at EventEmitter._tickCallback (node.js:238:41)

In this case, there is no active domain within process.nextTick().

@koichik
Copy link

koichik commented Apr 14, 2012

I was too late :)

@isaacs
Copy link
Author

isaacs commented Apr 14, 2012

This would close the leak:

diff --git a/src/node.js b/src/node.js
index 3322df6..d0ea1b4 100644
--- a/src/node.js
+++ b/src/node.js
@@ -249,6 +249,7 @@
     };

     process.nextTick = function(callback) {
+      if (process.domain) callback = process.domain.bind(callback);
       nextTickQueue.push(callback);
       process._needTickCallback();
     };

However, the cleaner approach would be to push objects rather than functions into the tick queue, and assign the active domain to them.

Also, it would be nice if the MakeCallback function could check the _disposed member of the relevant domain, and not fire the callback at all in that case.

@isaacs
Copy link
Author

isaacs commented Apr 14, 2012

c9c44e0 fixes the issue that @koichik found.

7232710008bdfabd17b6328e755dcf6d4b745929 seems somewhat controversial to me. It prevents callbacks from firing at all if they are bound to a domain that has been disposed. I could see this leading in some rare cases to memory leaks, since there may be IO that needs to be cleaned up. On the other hand, it's the responsibility of the user to clean up their stuff when they call domain.dispose().

@isaacs
Copy link
Author

isaacs commented Apr 14, 2012

Er, there's a bug in c9c44e0. Fixed by rebasing to ab886be and 054740e.

@shigeki
Copy link

shigeki commented Apr 15, 2012

It seems that setTimeout() causes the leak of domain.members array because the object of timers in domain.members are not removed after ontimeout.
The bug example shows as below in which setTimeout() is called to check Date field cache.
I have not read implementation of Domains3 in detail yet but this quick fix 82d28f3 seems working fine. Please review it.

// test-domain-leak.js
var http = require('http');
var domain = require('domain');
var d = domain.create();
var port = 8000;
d.on('error', function(err) {
  console.log('domain.members.length:', err.domain.members.length);
});
var server = http.createServer(d.bind(function(req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end("Hello");
  throw new Error('error');
})).listen(port, function() {
  setInterval(function() {
    var options = {
      host: 'localhost',
      port: port,
      path: '/'
    };
    http.get(options);
  }, 1000);
});
> ./node test-domain-leak.js
domain.members.length: 1
domain.members.length: 2
domain.members.length: 3
domain.members.length: 4

@isaacs
Copy link
Author

isaacs commented Apr 15, 2012

Good catch, @shigeki. Fixed more simply on 3dbc73f. Timers shouldn't be explicitly bound anyway, only implicitly.

@shigeki
Copy link

shigeki commented Apr 16, 2012

@isaacs Confirmed that setTimeout was fixed. Thanks. But a callback of setInterval() is still out of domain.

// test-setinterval.js
var domain = require('domain');
var d = domain.create();
d.on('error', function(err) {
  console.log(err);
});
function foo(msg) {
  var timer = setInterval(function() {
    clearTimeout(timer);
    throw new Error('error test');
  }, 100);
}
d.run(foo);
unixjp:~/tmp/isaacs/node> ./node test-setinterval.js

domain.js:66
    throw er;
          ^
Error: error test
    at Timer.<anonymous> (/home/ohtsu/tmp/isaacs/node/test-setinterval.js:10:11)
    at Timer.<anonymous> (timers.js:232:14)

Fixed in 317f990 as below. Please review it.

unixjp:~/tmp/isaacs/node> ./node test-setinterval.js
{ [Error: error test]
  domain: { members: [], _events: { error: [Function] } },
  domain_thrown: true }

@isaacs
Copy link
Author

isaacs commented Apr 16, 2012

@shigeki Indeed. I fixed this on c2e4a5dc7530f77a16cfe97793c41e64770630bf by making the key always "domain" rather than "_domain", so that MakeCallback picks it up properly.

@shigeki
Copy link

shigeki commented Apr 16, 2012

That's much more elegant. Confirmed that timers works good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is swallowing errors a good idea here? Shouldn't they be logged or emitted to somewhere somehow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, they'd always be errors like that you can't .end() a stream that's already closed or something. Ie, useless and extraneous. Until we have a consistent way to shut things down, and to tell if they've been shut down, I think this is about the best we can do.

isaacs added 12 commits April 17, 2012 11:56
This is a squashed commit of the main work done on the domains-wip branch.

The original commit messages are preserved for posterity:

* Implicitly add EventEmitters to active domain
* Implicitly add timers to active domain
* domain: add members, remove ctor cb
* Don't hijack bound callbacks for Domain error events
* Add dispose method
* Add domain.remove(ee) method
* A test of multiple domains in process at once
* Put the active domain on the process object
* Only intercept error arg if explicitly requested
* Typo
* Don't auto-add new domains to the current domain

    While an automatic parent/child relationship is sort of neat,
    and leads to some nice error-bubbling characteristics, it also
    results in keeping a reference to every EE and timer created,
    unless domains are explicitly disposed of.

* Explicitly adding one domain to another is still fine, of course.
* Don't allow circular domain->domain memberships
* Disposing of a domain removes it from its parent
* Domain disposal turns functions into no-ops
* More documentation of domains
* More thorough dispose() semantics
* An example using domains in an HTTP server
* Don't handle errors on a disposed domain
* Need to push, even if the same domain is entered multiple times
* Array.push is too slow for the EE Ctor
* lint domain
* domain: docs
* Also call abort and destroySoon to clean up event emitters
* domain: Wrap destroy methods in a try/catch
* Attach tick callbacks to active domain
* domain: Only implicitly bind timers, not explicitly
* domain: Don't fire timers when disposed.
* domain: Simplify naming so that MakeCallback works on Timers
* Add setInterval and nextTick to domain test
* domain: Make stack private
@isaacs isaacs merged commit d4ed2e6 into nodejs:master Apr 17, 2012
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants