[WIP] Allow hadd to run in multiprocess mode #491

xvallspl · 2017-04-07T16:00:30Z

No description provided.

pcanal · 2017-04-07T16:02:34Z

the commit seems to mix white space changes and functional changes making it hard to figure out the functional changes.

xvallspl · 2017-04-07T16:12:21Z

Oh, no.

Sorry, my editor wasn't showing the space changes when in diff mode.

I'll fix it over the weekend.

phsft-bot · 2017-04-07T16:43:28Z

Starting build on centos7/gcc49, mac1011/native, slc6/gcc49, slc6/gcc62, ubuntu14/native and CMake flags -Dccache=ON -Dimt=OFF

phsft-bot · 2017-04-09T07:30:11Z

Starting build on centos7/gcc49, mac1011/native, slc6/gcc49, slc6/gcc62, ubuntu14/native and CMake flags -Dccache=ON -Dimt=OFF

phsft-bot · 2017-04-09T07:32:49Z

Starting build on centos7/gcc49, mac1011/native, slc6/gcc49, slc6/gcc62, ubuntu14/native and CMake flags -Dccache=ON -Dimt=OFF

pcanal · 2017-04-10T16:38:16Z

main/src/hadd.cxx

+            }
+            if (request == 1) {
+               request = strtol(argv[a+1], 0, 10);
+               if (request < kMaxLong && request >= 0) {


given that request is a long isn't the first part always true? If I remember correctly, one has to check errno to figure out if something went wrong in strtol.

Not necessarily.

From cpp reference:

If successful, an integer value corresponding to the contents of str is returned.

If the converted value falls out of range of corresponding return type, a range error occurs (setting errno to ERANGE) and LONG_MAX, LONG_MIN, LLONG_MAX or LLONG_MIN is returned.

If no conversion can be performed, 0 is returned

Which means that this may be incorrect when no conversion can be performed. There are a couple more places were hadd was already doing that. I'll check.

pcanal · 2017-04-10T16:45:43Z

main/src/hadd.cxx

+         std::cout<<"hadd failed at the parallel stage"<<std::endl;
+      }
+      for(auto pf:partialFiles){
+         gSystem->Unlink(pf.c_str());


It might be helpfull for debugging to have a way to disable the unlink of the intermediary files.

It could be useful, yes.
How? Adding another option? This will only be used in the multiprocess case, so maybe a combination of the two? Something like -jdbg?

pcanal · 2017-04-10T16:47:27Z

main/src/hadd.cxx

+   if(multiproc){
+      for(auto i = 0; (i*step)<filesToProcess; i++) {
+         std::stringstream buffer;
+         buffer <<"partial"<<i<<".root";


If I read correctly the intermediary files name are "partial1.root", "partial2.root". If this is the case, we may want to enhance the name to allow for the possibility of running two hadd in the same directory at the same time (i.e. at the moment this is not possible).

Makes sense. Will look into that.

Appends the same unique identifier to the name of the partial files within the same hadd execution.

dpiparo · 2017-04-11T12:50:07Z

Hi @xvallspl, nice. Are now the partial files in the /tmp or equivalent directory? Do we have a switch not to unlink them after merging and to print their names for debugging purposes?

xvallspl · 2017-04-11T13:07:44Z

Hi, Danilo. They are not in the tmp directory, but I guess they could end up there. Should be documented for the debugging case. I'm not sure what to call the switch, as I comented previously. Maybe -jdbg? as It is only for the parallel case? maybe include it in the -v case? Cheers, Xavi

…

On Tue, Apr 11, 2017 at 2:50 PM Danilo Piparo ***@***.***> wrote: Hi @xvallspl <https://github.com/xvallspl>, nice. Are now the partial files in the /tmp or equivalent directory? Do we have a switch not to unlink them after merging and to print their names for debugging purposes? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#491 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA074yyke_AsTNkvQUsWssbN0gdB4yW_ks5ru3b_gaJpZM4M3FNv> .

pcanal · 2017-04-11T14:32:50Z

I would not overload -v with a change in behavior. I agree that -jdbg is a good option. -dbg might be even better (i.e. even if at the moment it only change the parallel behavior, it might change behavior in the scalar case also ... in the future).

pcanal · 2017-04-11T14:38:33Z

Are now the partial files in the /tmp or equivalent directory?

Because the intermediary files could be large or numerous, if we rely on a shared directory we ought to create them in a user-specific subdirectory.

However using a shared directory may cause problem in itself. On some system /tmp is small and /var/tmp should be used (or maybe simply use what TMPDIR says).

All in all, it might even be better to use a (subdirectory of the) output directory which is, per se, guaranteed to be writeable by the user (or the output can not be done). However, whether it has enough space for twice the final output size is a (small) concern.

martinmine · 2017-04-11T14:43:52Z

@phsft-bot build!

dpiparo · 2017-04-11T14:52:18Z

@pcanal: In a previous incarnation of this pr I suggested to use TSystem to get the right tmp dir. The local dir, in case of eos or afs or other shared file systems can become a huge performance penalty.

xvallspl · 2017-04-11T14:57:19Z

Should I add a -d option for specifying the work directory and make it $TMP by default?

pcanal · 2017-04-11T14:59:28Z

Should I add a -d option for specifying the work directory and make it $TMP by default?

Good idea.

In a previous incarnation of this pr I suggested to use TSystem to get the right tmp dir. The local dir, in case of eos or afs or other shared file systems can become a huge performance penalty.

Good point indeed. I wonder if we have a means of knowing whether the destination directory is local (aka 'fast') or not. We can find out whether the file URL is on the local node or not (via TFile::GetType) but this does not tell us whether it is on afs or not.

This avoids the deletion of the intermediate files.

xvallspl · 2017-04-12T14:11:25Z

@phsft-bot build!

phsft-bot · 2017-04-12T14:35:52Z

Starting build on centos7/gcc49, mac1011/native, slc6/gcc49, slc6/gcc62, ubuntu14/native and CMake flags -Dccache=ON -Dimt=OFF

phsft-bot · 2017-04-18T13:18:29Z

Starting build on centos7/gcc49, mac1011/native, slc6/gcc49, slc6/gcc62, ubuntu14/native and CMake flags -Dccache=ON -Dimt=OFF

dpiparo · 2017-04-28T06:23:18Z

@xvallspl : I reverted the PR. It breaks classic builds as TProcessExecutor is not available there. Perhaps we can take this opportunity to have a special case for the single threaded mode also in the cmake builds which is identical to the previous version of hadd?

xvallspl added 2 commits April 9, 2017 09:19

Allow hadd to run in multiprocess

4e8c226

Apply clang-format to hadd

3235a7f

xvallspl force-pushed the hadd branch from f2d80fe to 3235a7f Compare April 9, 2017 07:29

Fix typo in help (hadd)

7cd817a

pcanal reviewed Apr 10, 2017

View reviewed changes

xvallspl changed the title ~~Allow hadd to run in multiprocess mode~~ [WIP} Allow hadd to run in multiprocess mode Apr 11, 2017

xvallspl changed the title ~~[WIP} Allow hadd to run in multiprocess mode~~ [WIP] Allow hadd to run in multiprocess mode Apr 11, 2017

xvallspl force-pushed the hadd branch from f61b836 to abff616 Compare April 11, 2017 12:45

Resolves naming conflicts in partial files in hadd

e2f5058

Appends the same unique identifier to the name of the partial files within the same hadd execution.

xvallspl force-pushed the hadd branch from abff616 to e2f5058 Compare April 11, 2017 12:46

xvallspl added 2 commits April 11, 2017 17:03

Add debug option for parallel hadd.

b9b1d3f

This avoids the deletion of the intermediate files.

Allow user to specify temp dir.

ef34f13

xvallspl force-pushed the hadd branch from 7fffa25 to ef34f13 Compare April 11, 2017 17:09

Check step size only when multiprocess

a7d49fa

dpiparo merged commit 09ed3f3 into root-project:master Apr 26, 2017

phsft-bot mentioned this pull request Nov 25, 2017

[cxxmodules] Preload tmva tree player graf #1365

Closed

phsft-bot mentioned this pull request Feb 16, 2018

[modules][cxxmodules] Improve the layering of our special modules. #1412

Merged

phsft-bot mentioned this pull request Apr 3, 2018

[cxxmodules] Fix failing runtime_cxxmodules tests by preloading modules #1814

Merged

phsft-bot mentioned this pull request Apr 26, 2018

[cxxmodules] Adding missing dependency for pyroot #1922

Merged

phsft-bot mentioned this pull request Sep 27, 2022

[cxxmodules][cling] Avoid loading some unnecessary modules #10910

Closed

[WIP] Allow hadd to run in multiprocess mode #491

[WIP] Allow hadd to run in multiprocess mode #491

Uh oh!

Conversation

xvallspl commented Apr 7, 2017

Uh oh!

pcanal commented Apr 7, 2017

Uh oh!

xvallspl commented Apr 7, 2017

Uh oh!

phsft-bot commented Apr 7, 2017

Uh oh!

phsft-bot commented Apr 9, 2017

Uh oh!

phsft-bot commented Apr 9, 2017

Uh oh!

pcanal Apr 10, 2017

Choose a reason for hiding this comment

Uh oh!

xvallspl Apr 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcanal Apr 10, 2017

Choose a reason for hiding this comment

Uh oh!

xvallspl Apr 11, 2017

Choose a reason for hiding this comment

Uh oh!

pcanal Apr 10, 2017

Choose a reason for hiding this comment

Uh oh!

xvallspl Apr 11, 2017

Choose a reason for hiding this comment

Uh oh!

dpiparo commented Apr 11, 2017

Uh oh!

xvallspl commented Apr 11, 2017 via email

Uh oh!

pcanal commented Apr 11, 2017

Uh oh!

pcanal commented Apr 11, 2017

Uh oh!

martinmine commented Apr 11, 2017

Uh oh!

dpiparo commented Apr 11, 2017

Uh oh!

xvallspl commented Apr 11, 2017

Uh oh!

pcanal commented Apr 11, 2017

Uh oh!

xvallspl commented Apr 12, 2017

Uh oh!

phsft-bot commented Apr 12, 2017

Uh oh!

phsft-bot commented Apr 18, 2017

Uh oh!

dpiparo commented Apr 28, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xvallspl Apr 11, 2017 •

edited

Loading