-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[hadd] add option to specify which objects should (not) be merged #5995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[hadd] add option to specify which objects should (not) be merged #5995
Conversation
|
Can one of the admins verify this patch? |
|
Hello! :) Non-functional issue but before actual review may I suggest blocklist / allow list terminology? |
Oh, sorry! I've changed the terms |
main/src/hadd.cxx
Outdated
| #include "TKey.h" | ||
| #include "TClass.h" | ||
| #include "TSystem.h" | ||
| #include "TStopwatch.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #include "TStopwatch.h" | |
| #include "TStopwatch.h" |
Indeed we ought to skip this.
|
This looks like a nice feature thanks. I wonder if rather than a single list (that is either an inclusion list or an exclusion list), we should allow both ... humm ... right .. this is not yet supported by TFileMerger :( ... |
Co-authored-by: Philippe Canal <[email protected]>
Co-authored-by: Philippe Canal <[email protected]>
…to do multiprocessing with filelist
|
Thank you for taking a look @pcanal ! I have removed the Stopwatch and the multiprocessing part. I have looked into multiprocessing further, but even the current version of Also, the 2 items listed under shortcomings in the description remain. Let me know if I can do something about them. |
Why can't it work? Having the @<inputfile_list> should end up being equivalent to listing the files explicitly on the command line? What am I missing? Adding the @<inputfile_list> is a good idea that (should be) independent on whether the multiprocessing works (I assume you mean the feature where hadd spawns multiple processes ; this feature is helpful for the case where the file contains (many) histograms and is a slight pessimissation if the file contains TTrees).
Do you have a stack trace of the where the thread/processes stall? |
TFileMerger indeeds creates the directory in the output file before it checks whether they have any content. I suspect we could remove it if its ends up empty. |
Sorry for not directly referring to the code here. It counts the number of input arguments rather than the number of input files to merge. If you only give it a single
True, there is probably no speed gain at all for
With hadd "From tags/v6-20-04@v6-20-04" I got So hadd stalls after TMerger added the first input file for each process. |
|
In the meantime I ran into another issue that concerns empty directories. In the files I'm trying to merge, it rarely happens that a directory/tree is empty because no events have been selected (for that specific selection). In such a case Trying my local version built with debug symbols and running gdb didn't yield further info. I was a bit puzzled to see this, since I could swear that I successfully merged files with empty directories in the past. And in fact, it works with root Do you have an idea where this could come from? Since this is only loosely related to the actual PR, it might not be the right place to discuss this. I can post it elsewhere if that would make sense. |
Hi all! This is my first PR, please be gentle 🙂
The PR is meant to trigger a discussion on the feature described in the title, rather than being a request that it should go in as-is.
Description
Add
-land-ecommand line options for passing a list of objects toTFileMerger::AddObjectNamesand setting theTFileMerger::kSkipListedorTFileMerger::kOnlyListedflags.Purpose
Allows to merge only certain objects from the list of input files.
Use case
Merging single directories/trees into separate root files. More concrete: when producing nTuples (on the WLCG) with common LHCb tools, one sometimes wants to run different selections on the same input stream and write out a tree for each of the selections. All these trees end up in the same root file.
Shortcomings
TFileMergerrather thanhadd.