-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[WIP] Allow hadd to run in multiprocess mode #491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
the commit seems to mix white space changes and functional changes making it hard to figure out the functional changes. |
|
Oh, no. Sorry, my editor wasn't showing the space changes when in diff mode. I'll fix it over the weekend. |
|
Starting build on |
|
Starting build on |
|
Starting build on |
| } | ||
| if (request == 1) { | ||
| request = strtol(argv[a+1], 0, 10); | ||
| if (request < kMaxLong && request >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that request is a long isn't the first part always true? If I remember correctly, one has to check errno to figure out if something went wrong in strtol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily.
From cpp reference:
- If successful, an integer value corresponding to the contents of str is returned.
- If the converted value falls out of range of corresponding return type, a range error occurs (setting errno to ERANGE) and LONG_MAX, LONG_MIN, LLONG_MAX or LLONG_MIN is returned.
- If no conversion can be performed, 0 is returned
Which means that this may be incorrect when no conversion can be performed. There are a couple more places were hadd was already doing that. I'll check.
main/src/hadd.cxx
Outdated
| std::cout<<"hadd failed at the parallel stage"<<std::endl; | ||
| } | ||
| for(auto pf:partialFiles){ | ||
| gSystem->Unlink(pf.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpfull for debugging to have a way to disable the unlink of the intermediary files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be useful, yes.
How? Adding another option? This will only be used in the multiprocess case, so maybe a combination of the two? Something like -jdbg?
main/src/hadd.cxx
Outdated
| if(multiproc){ | ||
| for(auto i = 0; (i*step)<filesToProcess; i++) { | ||
| std::stringstream buffer; | ||
| buffer <<"partial"<<i<<".root"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read correctly the intermediary files name are "partial1.root", "partial2.root". If this is the case, we may want to enhance the name to allow for the possibility of running two hadd in the same directory at the same time (i.e. at the moment this is not possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will look into that.
Appends the same unique identifier to the name of the partial files within the same hadd execution.
|
Hi @xvallspl, nice. Are now the partial files in the /tmp or equivalent directory? Do we have a switch not to unlink them after merging and to print their names for debugging purposes? |
|
Hi, Danilo.
They are not in the tmp directory, but I guess they could end up there.
Should be documented for the debugging case.
I'm not sure what to call the switch, as I comented previously. Maybe
-jdbg? as It is only for the parallel case? maybe include it in the -v case?
Cheers,
Xavi
…On Tue, Apr 11, 2017 at 2:50 PM Danilo Piparo ***@***.***> wrote:
Hi @xvallspl <https://github.com/xvallspl>, nice. Are now the partial
files in the /tmp or equivalent directory? Do we have a switch not to
unlink them after merging and to print their names for debugging purposes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#491 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA074yyke_AsTNkvQUsWssbN0gdB4yW_ks5ru3b_gaJpZM4M3FNv>
.
|
|
I would not overload -v with a change in behavior. I agree that -jdbg is a good option. -dbg might be even better (i.e. even if at the moment it only change the parallel behavior, it might change behavior in the scalar case also ... in the future). |
Because the intermediary files could be large or numerous, if we rely on a shared directory we ought to create them in a user-specific subdirectory. However using a shared directory may cause problem in itself. On some system /tmp is small and /var/tmp should be used (or maybe simply use what TMPDIR says). All in all, it might even be better to use a (subdirectory of the) output directory which is, per se, guaranteed to be writeable by the user (or the output can not be done). However, whether it has enough space for twice the final output size is a (small) concern. |
|
@phsft-bot build! |
|
@pcanal: In a previous incarnation of this pr I suggested to use TSystem to get the right tmp dir. The local dir, in case of eos or afs or other shared file systems can become a huge performance penalty. |
|
Should I add a -d option for specifying the work directory and make it $TMP by default? |
Good idea.
Good point indeed. I wonder if we have a means of knowing whether the destination directory is local (aka 'fast') or not. We can find out whether the file URL is on the local node or not (via TFile::GetType) but this does not tell us whether it is on afs or not. |
This avoids the deletion of the intermediate files.
|
@phsft-bot build! |
|
Starting build on |
|
Starting build on |
|
@xvallspl : I reverted the PR. It breaks classic builds as TProcessExecutor is not available there. Perhaps we can take this opportunity to have a special case for the single threaded mode also in the cmake builds which is identical to the previous version of hadd? |
No description provided.