-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Optimizations for TBufferMerger and parallel I/O in general #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Starting build on |
core/meta/src/TClass.cxx
Outdated
| { | ||
| R__LOCKGUARD(gInterpreterMutex); | ||
| // Load class info if not already loaded and auto-parsing is not suspended | ||
| if (!fClassInfo && !gInterpreter->IsAutoParsingSuspended()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this to work properly fClassInfo needs to become an atomic (i.e. the test here can happen in the middle of fClassInfo being updated and lead to returning a partial set pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing the two tests is semantic incorrect. You can have a situation where fClassInfo is not set and when AutoParsing is suspended but fClassInfo can still be set because the information is already loaded/present in the AST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot see how a pointer can be only partially set.
Your second comment does make sense, though. What do I need to avoid when auto parsing is suspended? It's not clear to me. I thought AutoParse(GetName()) should only be called if it's not suspended...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if fClassInfo is being set during the if statement, then the lock is acquired. By the time we can continue, fClassInfo should be either not set or fully set, so the check for early return will take care that we don't redo the work or return an incomplete result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot see how a pointer can be only partially set.
Re-read the 'raison d'etre' of atomics. Without fClassInfo being an atomic, there are race condition where a thread is writing to a memory area at the same time another thread is reading this same are and only a partial copy of the writing (let's the first 4 out of 8 bytes) is seen by the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do I need to avoid when auto parsing is suspended?
if (!fClassInfo && !gInterpreter->IsAutoParsingSuspended()) {
The point I am trying to make is that if fClassInfo is a nullptr and fCanLoadClassInfo is true and
IsAutoParsingSuspend is true, then there is still case where fClassInfo can be set (because the AST already contains the information.
What do I need to avoid when auto parsing is suspended?
Just the setting of fCanLoadClassInfo and the warning about the information missing since in this case the lookup failure (if any) was not fully authoritative ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if fClassInfo is being set during the if statement, then the lock is acquired. By the time we can continue, fClassInfo should be either not set or fully set, so the check for early return will take care that we don't redo the work or return an incomplete result.
Unfortunately not accurate. In the (rare, very rare but real and seen by CMSSW) cases where fClassInfo is partially filled when the if statement happens, then the function returns immediately and then there is (still) a chance that the reading thread will then proceed to try to use this partially filled value of fClassInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, if fClassInfo is made atomic, the start of the function can be
if (fClassInfo == nullptr && fCanLoadClassInfo) { ....
or
if (fClassInfo != nullptr || !fCanLoadClassInfo) return;
I am not sure which one is more efficient ...
You may even want to use R__likely/R__unlikely
if (R__unlikely(fClassInfo == nullptr && fCanLoadClassInfo)) { ....
// or
if (R__likely(fClassInfo != nullptr || !fCanLoadClassInfo)) return;
Cheers,
Philippe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since fCanLoadClassInfo is atomic, we should probably short-circuit on it rather than fClassInfo. That is, we should use if (!fCanLoadClassInfo || fClassInfo) return; then set fCanLoadClassInfo to false once it's already loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have a good point in that case we can simplify it to just
if (R__likely(!fCanLoadClassInfo)) return; // fClassInfo already loaded or not loadable.
R___LOCKGUARD(gROOTMutext);
if (!fCanLoadClassInfo || fClassInfo) return; // in case some other thread beat us to the punch and we are already done.
...
// Previous version of the code
| if (!gInterpreter->IsAutoParsingSuspended()) { | ||
| if (!fClassInfo) { | ||
| if (!fClassInfo) | ||
| gInterpreter->SetClassInfo(const_cast<TClass*>(this)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is called SetClassInfo. The comment is redundant in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ambiguity/unusual thing is that it sets a data member of the 'parameter' rather then object it is called upon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that every call to this function needs a comment to clarify that it sets the fClassInfo pointer? That doesn't make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because of the odd pattern and the static_cast being used, yes I felt compelled to do so (well in 3 out of the 4 actual calls ... but if somehow I am wrong that the pattern is odd/confusing, then indeed it is fine to omit the comment ...
core/meta/src/TClass.cxx
Outdated
| { | ||
| R__LOCKGUARD(gInterpreterMutex); | ||
| // Load class info if not already loaded and auto-parsing is not suspended | ||
| if (!fClassInfo && !gInterpreter->IsAutoParsingSuspended()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing the two tests is semantic incorrect. You can have a situation where fClassInfo is not set and when AutoParsing is suspended but fClassInfo can still be set because the information is already loaded/present in the AST.
|
|
||
| // If another thread executed LoadClassInfo at about the same time | ||
| // as this thread return early since the work was done. | ||
| if (!fCanLoadClassInfo) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read correct this test is removed, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the comment above this test is accurate, this should be covered by the test if (fClassInfo) return; right after acquiring the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is not fully accurate. There is more semantic to fCanLoadClassInfo, it also cover the case where fClassInfo is null but the TClass knows it will not be able to load it (and thus there is not point in wasting CPU re-finding out that information).
And also fCanLoadClassInfo is an atomic.
|
On a general note, very nice find! Thanks! Does the second graph means that in that scenario the 'merging' is now the bottle neck (i.e the one long thread)? |
|
Yes, the second graph basically shows that the bottleneck is now the disk speed. :-) |
|
I am asking for access to an NVMe SSD that is installed on this server. With that I will collect more scaling data, which should be much better than before. |
|
The graph 'seems' to also indicates that there is only 'one' cluster per thread .. is that plausible? |
|
Build failed on centos7/gcc49. Warnings:
Failing tests: |
|
Build failed on slc6/gcc49. Warnings:
Failing tests: |
To emulate an infinite speed disk, I suspect you can also write the file named "/dev/null" :) More realistically, this is actually good news and we ought to be able to compare the run-time of this process to the run-time of the equivalent copy or dd of full resulting file ... humm this would involve/include the reading so a completely fair comparison .... |
|
Well, the failing test shows that there is indeed a problem with my changes. I will push a new version shortly. |
4e0d5b3 to
ce7a646
Compare
|
Starting build on |
ce7a646 to
c6fe0e8
Compare
|
Starting build on |
|
I removed the other two commits because I still see a race condition on |
|
Build failed on centos7/gcc49. Errors:
Warnings:
|
|
|
Build failed on mac1012/native. Errors:
Warnings:
|
c6fe0e8 to
a8281a6
Compare
|
Starting build on |
|
|
Build failed on slc6/gcc62. Errors:
Warnings:
|
|
annoying :( ... the 'other' thread is already gone .... |
|
Build failed on centos7/gcc49. Errors:
Warnings:
|
a8281a6 to
0fa8ca8
Compare
|
Fixed typo |
|
Starting build on |
core/meta/src/TClass.cxx
Outdated
| "no interpreter information for class %s is available even though it has a TClass initialization routine.", | ||
| fName.Data()); | ||
| } | ||
| fCanLoadClassInfo = kFALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be self consistent, we need
if (autoParse) fCanLoadClassInfo = kFALSE;
When auto parsing is disabled, the lookup answer is not authoritative ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it not be loaded by other means as you said before? I'm confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After SetClassInfo is done we can be in many state:
- fClassInfo is now a value, we are all good, no new to try reloading (independently of the auto-parsing flag)
- fClassInfo is still a nullptr but auto-parsing is enabled, there is no more information to be have anywhere, the AST decl for the class does not exist
- fClassInfo is still anullptr but auto-parsing is disabled, we now know the AST decl for the class is not in the AST but it might be brought into the AST if we were to retry with auto-parsing on (because the header declaring it might be parsed).
core/meta/src/TClass.cxx
Outdated
| // If another thread executed LoadClassInfo at about the same time | ||
| // as this thread return early since the work was done. | ||
| if (!fCanLoadClassInfo) return; | ||
| bool AutoParse = !gInterpreter->IsAutoParsingSuspended(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming rule is that variable name start with a lower case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can change that.
| // class was registered by a dictionary, but the info came from reading | ||
| // the pch. | ||
| // Note: This check avoids using AutoParse for classes in the pch! | ||
| if (fClassInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not need this check anymore? Can we return early if fClassInfo (after making it a unique_ptr)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When fClassInfo is set, fCanLoadClassInfo is set to false, so this test is redundant with the one right above it. I could check for fClassInfo != nullptr before taking the lock, but fClassInfo is not atomic, while fCanLoadClassInfo is. Therefore, I use the latter exclusively for checking and returning early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename the variable to fClassInfoLoaded or similar and use accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Axel,
Well you may not have read the doc fully :). Orginally 'fCanLoadClassInfo' was really indicating whether there was a chance to be able to load the ClassInfo or not ... and it still does in a way.
At the moment when fCanLoadClassInfo is false, fClassInfo can be either a valid pointer or a nullptr, so "have loaded" would be inaccurate.
In other word, if fCanLoadClassInfo is true running the body of LoadClassInfo is likely to yield a change in the value of fClassInfo, when it is false (and whether or not we already ran it) there is no point in running the body of LoadClassInfo as it is guaranteed to not yield any change in the value of fClassInfo.
Cheers,
Philippe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Philippe!
|
Build failed on mac1012/native. Warnings:
Failing tests: |
|
Build failed on slc6/gcc62. Warnings:
Failing tests: |
|
Build failed on centos7/gcc49. Warnings:
Failing tests: |
|
Build failed on ubuntu14/native. Warnings:
Failing tests: |
eb98b89 to
3450731
Compare
|
Starting build on |
|
Build failed on mac1012/native. Warnings:
Failing tests: |
|
Build failed on slc6/gcc49. Warnings:
Failing tests: |
|
Build failed on slc6/gcc62. Warnings:
Failing tests: |
|
@pcanal Could you please take a look at this test? http://cdash.cern.ch/testDetails.php?test=26224986&build=367429 It could be failing because of my commit, right? |
Since fCanLoadClassInfo is atomic, the lock is not needed when checking for it. Returning early without waiting when the class info is already loaded yields significant performance improvements when writing in parallel.
3450731 to
a5441f4
Compare
|
Starting build on |
|
i guess it could .... but I am not sure how .... [Besides, today's version is still sub-optimal, i.e. a WIP] |
|
@pcanal Jenkins passed. Should I merge this, or should we gather all changes first? |
|
I am confused ... the PR contains VC change and only 2 changes in TClass ... and the one change is still introduced a decrease of the single-thread performance (missing if (fClassInfo) and/or removal of the fClassInfo!=nullptr && fCanLoadClassInfo.) |
|
The commit for Vc/VecCore sneaked in from another branch. I added it just to shut up warnings because of how things are on master at the moment, but it's not part of this PR (i.e. I will remove it prior to merging). I don't see a decrease of single thread performance, did you measure it? |
I know and have tried to explain it earlier. There is a (quite) common case where a TClass is created (running TClass::Init) and ends up with fClassInfo != nullptr but fCanLoadClassInfo == true. In this commong case the removal of the We either need to re-add this if statement OR better yet make sure that all cases of fClassInfo!=nullptr also have fCanLoadClassInfo == false. If you are still not convince, you can do the following test: |
|
I am not sure that setting |
|
Re-adding |
|
@Axel-Naumann Well, @pcanal disagrees. Philippe wants to just add That said, we surely need to fix this problem (given the difference seen in VTune), and I will keep working on a solution and will probably open a new pull request with either this change or similar changes after enough testing to ensure that everything is ok. |
…dition
Start 672: tutorial-io-loopdir
690/1156 Test root-project#666: tutorial-io-double32 ................................................ Passed 2.42 sec
Start 673: tutorial-io-loopdir11
691/1156 Test root-project#673: tutorial-io-loopdir11 ............................................... Passed 0.76 sec
Start 674: tutorial-io-mergeSelective
692/1156 Test root-project#672: tutorial-io-loopdir .................................................***Failed Error regular expression found in output. Regex=[Error in <] 1.23 sec
Processing /builddir/build/BUILD/root-6.26.10/tutorials/io/loopdir.C...
Info in <TCanvas::Print>: ps file hsimple.ps has been created
Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps
Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps
Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps
Info in <TCanvas::Print>: ps file hsimple.ps has been closed
Error in <TPostScript::Text>: Cannot open temporary file: hsimple.ps_tmp_2089748
…dition (#11725) Start 672: tutorial-io-loopdir 690/1156 Test #666: tutorial-io-double32 ................................................ Passed 2.42 sec Start 673: tutorial-io-loopdir11 691/1156 Test #673: tutorial-io-loopdir11 ............................................... Passed 0.76 sec Start 674: tutorial-io-mergeSelective 692/1156 Test #672: tutorial-io-loopdir .................................................***Failed Error regular expression found in output. Regex=[Error in <] 1.23 sec Processing /builddir/build/BUILD/root-6.26.10/tutorials/io/loopdir.C... Info in <TCanvas::Print>: ps file hsimple.ps has been created Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps Info in <TCanvas::Print>: ps file hsimple.ps has been closed Error in <TPostScript::Text>: Cannot open temporary file: hsimple.ps_tmp_2089748

These changes are based on an analysis with VTune that showed that most waiting happened around
TClass::LoadClassInfo(). Returning early without taking the lock when the information is already loaded yields very significant performance improvements. Images with each thread activity before and after the optimizations are attached. The x-axis represents time, light green means a thread is active, and dark green and brown mean running.Before changes:

After changes:
