-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add TBufferMerger and TBufferMergerFile to allow parallel data creating and writing to single file #533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TBufferMerger and TBufferMergerFile to allow parallel data creating and writing to single file #533
Conversation
|
Starting build on |
tree/tree/inc/TTreeMT.h
Outdated
| fMergingThread->join(); | ||
|
|
||
| fIsMerged = true; | ||
| return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reminder, we need to return the actual tree here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a FIXME would be good.
|
Hi,
|
|
Another comment: the headers are quite fat. Perhaps we can have in the headers the interfaces and have cxx files with the implementation and helper functions hidden from the interface (and the pch)? |
ef68901 to
e1e4692
Compare
|
Starting build on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
Here is my first set of comments.
On general level:
- It seems the headers do not include the ROOT-style heading.
- I find the TThread prefix in the naming exposing an implementation detail. In fact, it seems that we do not rely on threading in the file implementation, we just provide discriminate file blocks to the writers and then we squash them together.
- It seems misleading to the casual reader/developer/user to use the client-server naming (we have https server and networking libaries in ROOT). I'd use naming around discriminate writing blocks or smth like this.
- I'd like to see in-tree unit tests of the new file merging.
io/io/inc/TThreadMergingFile.h
Outdated
| // // | ||
| ////////////////////////////////////////////////////////////////////////// | ||
|
|
||
| #include "dcqueue.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pcanal, I am wondering if there is not already a ROOT container which we could rely on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems ROOT has no equivalent at this time, so we are thinking about using tbb::concurrent_queue instead of this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that leak a header-wise dependency on tbb, which we are trying to avoid in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header dependency can probably be avoided if we move implementation to source files, as was suggested by Danilo (headers are "fat" now...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that sounds reasonable to me.
io/io/inc/TThreadMergingServer.h
Outdated
| TTimeStamp fLastContact; | ||
| Double_t fTimeSincePrevContact; | ||
|
|
||
| ClientInfo() : fFile(0), fLocalName(), fContactsCount(0), fTimeSincePrevContact(0) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do in-place initialization instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean here, can you explain? Do you want to initialize the variables in the place they are declared in the class, or initialize ClientInfo where it's created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, it'd be better to initialize the variables in place, i.e. as part of their definitions.
io/io/inc/TThreadMergingServer.h
Outdated
| ClientInfo() : fFile(0), fLocalName(), fContactsCount(0), fTimeSincePrevContact(0) {} | ||
| ClientInfo(const char *filename, UInt_t clientId); | ||
|
|
||
| void Set(TFile *file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect (reading from the name) that this is a regular setter. It seems, that we are doing more work in this method eg. some non-trivial file registration. Could you pick up a better name (eg. RegisterFile) and add doxygen style documentation?
io/io/inc/TThreadMergingServer.h
Outdated
| return fFilename; | ||
| } | ||
|
|
||
| Bool_t InitialMerge(TFile *input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have documentation on each method.
io/io/inc/dcqueue.h
Outdated
| @@ -0,0 +1,212 @@ | |||
| //===----------------------------------------------------------------------===// | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be different from the ROOT-style headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is the code as imported. I wanted to give correct authorship to it, and then make my modifications on top. I will put everything in ROOT style as part of the work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, as long as this doesn't end up in master like that I am happy.
io/io/src/TThreadMergingFile.cxx
Outdated
| Int_t TThreadMergingFile::Write(const char *n, Int_t opt, Int_t bufsize) const | ||
| { | ||
| Error("Write const", "A const TFile object should not be saved. We try to proceed anyway."); | ||
| return const_cast<TThreadMergingFile *>(this)->Write(n, opt, bufsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having to const_cast that way here seems a bug (or a missing interface) to me.
tree/tree/inc/TTreeMT.h
Outdated
| fMergingThread->join(); | ||
|
|
||
| fIsMerged = true; | ||
| return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a FIXME would be good.
io/io/src/TThreadMergingServer.cxx
Outdated
| } | ||
|
|
||
| TIter next(&mergers); | ||
| ThreadFileMerger *info; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not moving the var definition along with the init?
| return BranchNames_t(bnBegin, bnBegin + nExpectedBranches); | ||
| } | ||
|
|
||
| #if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is WIP, as mentioned in the header of the PR. I want to be able to switch between the old and new implementations for testing. It will, of course, go away before this gets merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for explaining.
e1e4692 to
8048307
Compare
|
Starting build on |
8048307 to
bb590be
Compare
|
Starting build on |
bb590be to
676f1bc
Compare
|
Starting build on |
|
Hi @amadio,
|
676f1bc to
2b38bab
Compare
|
Starting build on |
|
Hi @dpiparo,
I would prefer to not discuss names. I think it's just easier if you and Philippe propose the names, and then I can rename the classes accordingly. I have no strong opinion on what to name them.
You are right, it's probably not needed. I left them as they were in the imported code. If not inheriting from TObject is desirable, I would prefer to avoid it too.
Done in my last push. I still need to move the appropriate things into Internal and Detail, though.
In order to use either unique_ptr or shared_ptr, the constructor/destructor for the client needs to be public. What I did was to return a raw pointer, and only the server is able to create/destroy clients.
This is done now. Since only the server can create/destroy clients, there is no need to throw. The server just destroys the clients before itself gets destroyed.
Yes, I will do that and push again soon. Only the ClientInfo and ThreadFileMerger classes have to be moved. I may fuse ThreadFileMerger into the server class if I can instead, so that we will have no need for either the ThreadFileMerger or ClientInfo anymore.
After I either fuse the classes as mentioned above, or move them into the Internal/Detail namespaces, this will be solved as well. |
I thought we had settled on the reverse? I.e. return a unique_ptr and the client has some sort of weak_ptr to some server own memory and then we throw if the client try to access the server after it dies? |
2b38bab to
ddfb310
Compare
|
Starting build on |
Yes, do not inherit from TObject unless you use one of its facility:
In addition any class inheriting directly or indirectly from TObject must have a ClassDef. |
There is a conflict between using unique_ptr, which requires public constructor/destructor, and having only the server being able to create them, by making the constructor/destructor private, which was suggested by Danilo. I think the solution with private constructor/destructor is nice, since the lifetime of the clients is forced to be the same as the server. The user needs to understand, however, that once the server goes down, all clients go down with it too. |
|
Apparently, inheriting from TObject is needed, unfortunately: I will try to get rid of |
At this point, I don't think that is necessary. That class is in one the use case (i.e. using a ROOT collection). So unless it is clear that replacing THashTable with an equivalent stl collection is much more performant, there is no need to waste time changing the code. |
io/io/inc/ROOT/TBufferMerger.hxx
Outdated
| * This function must be called before the TBufferMergerFile gets destroyed, | ||
| * or no data is appended to the TBufferMerger. | ||
| */ | ||
| virtual Int_t Write(const char *name = nullptr, Int_t opt = 0, Int_t bufsize = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the override keyword to mark what is oveloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, will come with next push.
|
|
||
| /** Write StreamerInfo for objects that have not already been written. */ | ||
| void WriteStreamerInfo(); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the class inherits from (indirectly) TObject, it must have a ClassDef which could be here
ClassDefOverride(TBufferMergerFile,0);
The 0 indicate that the class's data member default to transient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for noticing that. I tried to get rid of the TObject inheritance, but it's actually needed, unfortunately. What is difference between ClassDef and ClassDefOverride?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should I add a ClassDef to TBufferMerger? I can't get ROOT to know of its existence when running the tutorial as an interpreted script... Maybe I need a ClassImp in the source file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it's actually needed, unfortunately.
Why unfortunately :) ?
What is difference between ClassDef and ClassDefOverride?
ClassDef introduce overload of some of the interface in TObject. Once you add the override keyword to any overload, this signal the compiler that you want warning if you are missing they keyword on any other overload. ClassDefOverride includes the keyword in the right places :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't get ROOT to know of its existence when running the tutorial as an interpreted script...
This likely because the auto-parsing of the header is not enable ... i.e. you need to generate a dictionary for both TBufferMergerFile (for both auto-parsing and fulling the ClassDef) and TBufferMerger (for the auto-parsing) ...
and as a general rule we need a dictionary for all user visible classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am getting errors after adding the ClassDefs you suggested.
Yes, you need to generate the dictionary for those. (And thus update the LinkDef.h file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TObject inheritance is needed for ThreadFileMerger, TBufferMerger does not inherit from TObject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vgvassilev ThreadFileMerger, where the THashTable is needed, will go away later, so I wouldn't worry about fixing this right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do I need to do to have the dictionaries generated? I added ClassDef and ClassImp already, and I now get the errors above...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClassImp is no longer necessary/used. It was to record the name of the source file for THtml to be able to generate documentation.
To generate the dictionary, you need to update the file io/io/inc/LinkDef.h to list the new classes. You may also need to update the CMakeList.txt to make sure the new header files are passed to the dictionary generation step.
io/io/test/TBufferMerger.cxx
Outdated
|
|
||
| using namespace ROOT::Experimental; | ||
|
|
||
| static void fill(std::shared_ptr<TTree> tree, int init, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even for test, we ought to follow the function naming convention :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| const size_t nEventsPerWorker = nEvents / nWorkers; | ||
|
|
||
| // Make ROOT thread-safe | ||
| ROOT::EnableThreadSafety(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend we move this must early (first statement) to make it more obvious (that it is required if you do multithreading).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, will come with next push.
6d03994 to
5c73ea6
Compare
|
Starting build on |
|
Build failed on mac1012/native. Errors:
Warnings:
|
5c73ea6 to
45c9fec
Compare
|
Starting build on |
|
Alright, I added dictionary generation and fixed the tutorial. It now runs fine, as well as the tests. |
|
Build failed on centos7/gcc49. Failing tests: |
|
It looks like the test failed because it got stuck waiting to get data, but it never came. It doesn't get stuck on my machine, so I didn't see the problem before. I will log into the build node and fix it. That's probably the only remaining thing before we can merge this in. |
|
I logged into the node and it seems the node may just have been overloaded when the test was run, so it timed out. I ran the test several times, and the test succeeded everytime, even though it took 3 minutes to run the first time. Running 'top' I saw that there were some heavy compilations running at the same time, taking all CPU available... Maybe we should limit the number of simultaneous jobs on each node, and/or the maximum resources a job can use, so that there is always at least one full core available to each job. For now, I am just going to rebase and push again. I think the tests should succeed if the build node is not too busy. |
TBufferMerger is a class to facilitate writing data in parallel from multiple threads, while writing to a single output file. Its purpose is similar to TParallelMergingFile, but instead of using processes that connect to a network socket, TBufferMerger uses threads that each write to a TBufferMergerFile, which in turn push data into a queue managed by the TBufferMerger. These classes were originally written by Witold Pokorski and Philippe Canal for the GeantV project. There, they are named TThreadedMergingServer and TThreadedMergingFile, respectively. The imported code was then modified into the current version to be used initially by TDataFrame classes, and later by general ROOT users.
45c9fec to
ed8dc20
Compare
|
Starting build on |
|
The failure is not related to the code in this PR, apparently: |
|
@phsft-bot build |
|
Starting build on |
|
Merged. Nice job and thanks for following up all the comments which have been made. |
|
Thanks, Danilo. I will follow up with more fixes once we get the snapshot done as well. |
This PR is a work in progress for a parallel version of the snapshot action introduced recently to TDataFrame. This version compiles and passes the
test_snaphot.Ctest from roottest.git, but still needs quite a bit of work.I imported the files we use with attributed authorship for each part, but we now need to move them to the right place if needed and refactor the interfaces according to feedback from various sources.
Please feel free to make comments directly on the code, and I will try to address everything by the deadline for branching out 6.10.