Skip to content

Conversation

@amadio
Copy link
Member

@amadio amadio commented Jun 20, 2017

These changes are based on an analysis with VTune that showed that most waiting happened around TClass::LoadClassInfo(). Returning early without taking the lock when the information is already loaded yields very significant performance improvements. Images with each thread activity before and after the optimizations are attached. The x-axis represents time, light green means a thread is active, and dark green and brown mean running.

Before changes:
before

After changes:
after

@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

{
R__LOCKGUARD(gInterpreterMutex);
// Load class info if not already loaded and auto-parsing is not suspended
if (!fClassInfo && !gInterpreter->IsAutoParsingSuspended()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to work properly fClassInfo needs to become an atomic (i.e. the test here can happen in the middle of fClassInfo being updated and lead to returning a partial set pointer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing the two tests is semantic incorrect. You can have a situation where fClassInfo is not set and when AutoParsing is suspended but fClassInfo can still be set because the information is already loaded/present in the AST.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot see how a pointer can be only partially set.

Your second comment does make sense, though. What do I need to avoid when auto parsing is suspended? It's not clear to me. I thought AutoParse(GetName()) should only be called if it's not suspended...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if fClassInfo is being set during the if statement, then the lock is acquired. By the time we can continue, fClassInfo should be either not set or fully set, so the check for early return will take care that we don't redo the work or return an incomplete result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot see how a pointer can be only partially set.

Re-read the 'raison d'etre' of atomics. Without fClassInfo being an atomic, there are race condition where a thread is writing to a memory area at the same time another thread is reading this same are and only a partial copy of the writing (let's the first 4 out of 8 bytes) is seen by the reader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do I need to avoid when auto parsing is suspended?
if (!fClassInfo && !gInterpreter->IsAutoParsingSuspended()) {

The point I am trying to make is that if fClassInfo is a nullptr and fCanLoadClassInfo is true and
IsAutoParsingSuspend is true, then there is still case where fClassInfo can be set (because the AST already contains the information.

What do I need to avoid when auto parsing is suspended?

Just the setting of fCanLoadClassInfo and the warning about the information missing since in this case the lookup failure (if any) was not fully authoritative ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if fClassInfo is being set during the if statement, then the lock is acquired. By the time we can continue, fClassInfo should be either not set or fully set, so the check for early return will take care that we don't redo the work or return an incomplete result.

Unfortunately not accurate. In the (rare, very rare but real and seen by CMSSW) cases where fClassInfo is partially filled when the if statement happens, then the function returns immediately and then there is (still) a chance that the reading thread will then proceed to try to use this partially filled value of fClassInfo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, if fClassInfo is made atomic, the start of the function can be

if (fClassInfo == nullptr && fCanLoadClassInfo) { ....

or

if (fClassInfo != nullptr || !fCanLoadClassInfo) return;

I am not sure which one is more efficient ...
You may even want to use R__likely/R__unlikely

if (R__unlikely(fClassInfo == nullptr && fCanLoadClassInfo)) { ....
// or
if (R__likely(fClassInfo != nullptr || !fCanLoadClassInfo)) return;

Cheers,
Philippe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since fCanLoadClassInfo is atomic, we should probably short-circuit on it rather than fClassInfo. That is, we should use if (!fCanLoadClassInfo || fClassInfo) return; then set fCanLoadClassInfo to false once it's already loaded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have a good point in that case we can simplify it to just

if (R__likely(!fCanLoadClassInfo)) return; // fClassInfo already loaded or not loadable.
R___LOCKGUARD(gROOTMutext);
if (!fCanLoadClassInfo || fClassInfo) return; // in case some other thread beat us to the punch and we are already done.
...
// Previous version of the code

if (!gInterpreter->IsAutoParsingSuspended()) {
if (!fClassInfo) {
if (!fClassInfo)
gInterpreter->SetClassInfo(const_cast<TClass*>(this));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is called SetClassInfo. The comment is redundant in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ambiguity/unusual thing is that it sets a data member of the 'parameter' rather then object it is called upon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that every call to this function needs a comment to clarify that it sets the fClassInfo pointer? That doesn't make sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of the odd pattern and the static_cast being used, yes I felt compelled to do so (well in 3 out of the 4 actual calls ... but if somehow I am wrong that the pattern is odd/confusing, then indeed it is fine to omit the comment ...

{
R__LOCKGUARD(gInterpreterMutex);
// Load class info if not already loaded and auto-parsing is not suspended
if (!fClassInfo && !gInterpreter->IsAutoParsingSuspended()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing the two tests is semantic incorrect. You can have a situation where fClassInfo is not set and when AutoParsing is suspended but fClassInfo can still be set because the information is already loaded/present in the AST.


// If another thread executed LoadClassInfo at about the same time
// as this thread return early since the work was done.
if (!fCanLoadClassInfo) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read correct this test is removed, why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the comment above this test is accurate, this should be covered by the test if (fClassInfo) return; right after acquiring the lock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is not fully accurate. There is more semantic to fCanLoadClassInfo, it also cover the case where fClassInfo is null but the TClass knows it will not be able to load it (and thus there is not point in wasting CPU re-finding out that information).

And also fCanLoadClassInfo is an atomic.

@pcanal
Copy link
Member

pcanal commented Jun 20, 2017

On a general note, very nice find! Thanks!

Does the second graph means that in that scenario the 'merging' is now the bottle neck (i.e the one long thread)?

@amadio
Copy link
Member Author

amadio commented Jun 20, 2017

Yes, the second graph basically shows that the bottleneck is now the disk speed. :-)

@amadio
Copy link
Member Author

amadio commented Jun 20, 2017

I am asking for access to an NVMe SSD that is installed on this server. With that I will collect more scaling data, which should be much better than before.

@pcanal
Copy link
Member

pcanal commented Jun 20, 2017

The graph 'seems' to also indicates that there is only 'one' cluster per thread .. is that plausible?
I am deriving this guess from the fact that the merger thread seems to only start doing something after the worker thread has mostly done all their work.

@phsft-bot
Copy link

Build failed on centos7/gcc49.
See console output.

Warnings:

Failing tests:

@phsft-bot
Copy link

Build failed on slc6/gcc49.
See console output.

Warnings:

Failing tests:

@pcanal
Copy link
Member

pcanal commented Jun 20, 2017

I am asking for access to an NVMe SSD that is installed on this server.

To emulate an infinite speed disk, I suspect you can also write the file named "/dev/null" :)

More realistically, this is actually good news and we ought to be able to compare the run-time of this process to the run-time of the equivalent copy or dd of full resulting file ... humm this would involve/include the reading so a completely fair comparison ....

@amadio
Copy link
Member Author

amadio commented Jun 20, 2017

Well, the failing test shows that there is indeed a problem with my changes. I will push a new version shortly.

@amadio amadio force-pushed the parallel-io-optimizations branch from 4e0d5b3 to ce7a646 Compare June 20, 2017 16:09
@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

@amadio amadio force-pushed the parallel-io-optimizations branch from ce7a646 to c6fe0e8 Compare June 20, 2017 16:11
@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

@amadio
Copy link
Member Author

amadio commented Jun 20, 2017

I removed the other two commits because I still see a race condition on TList::RecursiveRemove that causes a crash of the TBufferMerger test.

@phsft-bot
Copy link

Build failed on centos7/gcc49.
See console output.

Errors:

  • /mnt/build/jenkins/workspace/root-pullrequests-build/root/core/meta/src/TClass.cxx:5468:29: error: ‘kFalse’ was not declared in this scope
  • /mnt/build/jenkins/workspace/root-pullrequests-build/root/core/meta/src/TClass.cxx:5486:27: error: ‘kFalse’ was not declared in this scope

Warnings:

@pcanal
Copy link
Member

pcanal commented Jun 20, 2017

I removed the other two commits because I still see a race condition on
What were the stack trace?

@phsft-bot
Copy link

Build failed on mac1012/native.
See console output.

Errors:

  • /Volumes/HDD2/ec/build/workspace/root-pullrequests-build/root/core/meta/src/TClass.cxx:5468:29: error: use of undeclared identifier 'kFalse'
  • /Volumes/HDD2/ec/build/workspace/root-pullrequests-build/root/core/meta/src/TClass.cxx:5486:27: error: use of undeclared identifier 'kFalse'; did you mean 'false'?

Warnings:

@amadio amadio force-pushed the parallel-io-optimizations branch from c6fe0e8 to a8281a6 Compare June 20, 2017 16:15
@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

@amadio
Copy link
Member Author

amadio commented Jun 20, 2017

...
[ RUN      ] TBufferMerger.ParallelTreeFill

 *** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================

Thread 2 (Thread 0x7f75ebd7a700 (LWP 834)):
#0  0x00007f75eee0eb7b in __GI___waitpid (pid=841, stat_loc=stat_loc
entry=0x7f75ebd72bc0, options=options
entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1  0x00007f75eed961eb in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:148
#2  0x00007f75eff097d8 in TUnixSystem::Exec (shellcmd=<optimized out>, this=0x1e074f0) at /home/amadio/src/root/core/unix/src/TUnixSystem.cxx:2118
#3  TUnixSystem::StackTrace (this=0x1e074f0) at /home/amadio/src/root/core/unix/src/TUnixSystem.cxx:2412
#4  0x00007f75eff0be1c in TUnixSystem::DispatchSignals (this=0x1e074f0, sig=kSigSegmentationViolation) at /home/amadio/src/root/core/unix/src/TUnixSystem.cxx:3643
#5  <signal handler called>
#6  TList::RecursiveRemove (this=0x1e0d6c0, obj=0x7f75e4d04b40) at /home/amadio/src/root/core/cont/src/TList.cxx:717
#7  0x00007f75efe885b3 in THashList::RecursiveRemove (this=<optimized out>, obj=0x7f75e4d04b40) at /home/amadio/src/root/core/cont/src/THashList.cxx:286
#8  0x00007f75efe25047 in TObject::~TObject (this=0x7f75e4d04b40, __in_chrg=<optimized out>) at /home/amadio/src/root/core/base/src/TObject.cxx:88
#9  0x00007f75f0eef6a9 in TTree::~TTree (this=0x7f75e4d04b40, __in_chrg=<optimized out>) at /home/amadio/src/root/tree/tree/src/TTree.cxx:917
#10 0x00007f75efeba961 in TClass::Destructor (this=this
entry=0x1e81470, obj=obj
entry=0x7f75e4d04b40, dtorOnly=dtorOnly
entry=false) at /home/amadio/src/root/core/meta/src/TClass.cxx:5074
#11 0x00007f75f0758a38 in TFileMerger::MergeRecursive (this=0x7f75ebd75a90, target=0x7f75e4dd2780, sourcelist=0x7f75e4844ec0, type=14) at /home/amadio/src/root/io/io/src/TFileMerger.cxx:756
#12 0x00007f75f0756e37 in TFileMerger::PartialMerge (this=this
entry=0x7f75ebd75a90, in_type=in_type
entry=14) at /home/amadio/src/root/io/io/src/TFileMerger.cxx:842
#13 0x00007f75f073097e in ROOT::Experimental::TBufferMerger::WriteOutputFile (this=0x7fff3e662d00) at /home/amadio/src/root/io/io/src/TBufferMerger.cxx:90
#14 0x00007f75ef904d9e in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/libstdc++.so.6
#15 0x00007f75ef0f43f4 in start_thread (arg=0x7f75ebd7a700) at pthread_create.c:333
#16 0x00007f75eee3f65f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

Thread 1 (Thread 0x7f75f1325a80 (LWP 771)):
#0  0x00007f75ef0f56cd in pthread_join (threadid=140144444679936, thread_return=0x0) at pthread_join.c:90
#1  0x00007f75ef905063 in std::thread::join() () from /usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/libstdc++.so.6
#2  0x00007f75f072fd76 in ROOT::Experimental::TBufferMerger::~TBufferMerger (this=0x7fff3e662d00, __in_chrg=<optimized out>) at /home/amadio/src/root/io/io/src/TBufferMerger.cxx:35
#3  0x00000000004182f0 in TBufferMerger_ParallelTreeFill_Test::TestBody (this=<optimized out>) at /home/amadio/src/root/io/io/test/TBufferMerger.cxx:86
#4  0x000000000043fa74 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x442444 "the test body", method=<optimized out>, object=<optimized out>) at /home/amadio/build/root/googletest-prefix/src/googletest/googletest/src/gtest.cc:2402
#5  testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object
entry=0x1e5e370, method=<optimized out>, location=location
entry=0x442444 "the test body") at /home/amadio/build/root/googletest-prefix/src/googletest/googletest/src/gtest.cc:2438
#6  0x00000000004365aa in testing::Test::Run (this=0x1e5e370) at /home/amadio/build/root/googletest-prefix/src/googletest/googletest/src/gtest.cc:2475
#7  0x00000000004366f8 in testing::TestInfo::Run (this=0x1e5e520) at /home/amadio/build/root/googletest-prefix/src/googletest/googletest/src/gtest.cc:2656
#8  0x00000000004367d5 in testing::TestCase::Run (this=0x1e5e0a0) at /home/amadio/build/root/googletest-prefix/src/googletest/googletest/src/gtest.cc:2774
#9  0x0000000000436ab7 in testing::internal::UnitTestImpl::RunAllTests (this=0x1e5de00) at /home/amadio/build/root/googletest-prefix/src/googletest/googletest/src/gtest.cc:4649
#10 0x000000000043ff24 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (location=0x4419b8 "auxiliary test code (environments or event listeners)", method=<optimized out>, object=<optimized out>) at /home/amadio/build/root/googletest-prefix/src/googletest/googletest/src/gtest.cc:2402
#11 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x1e5de00, method=<optimized out>, location=location
entry=0x4419b8 "auxiliary test code (environments or event listeners)") at /home/amadio/build/root/googletest-prefix/src/googletest/googletest/src/gtest.cc:2438
#12 0x0000000000436dd0 in testing::UnitTest::Run (this=0x650860 <testing::UnitTest::GetInstance()::instance>) at /home/amadio/build/root/googletest-prefix/src/googletest/googletest/src/gtest.cc:4257
#13 0x0000000000417c59 in RUN_ALL_TESTS () at /home/amadio/build/root/googletest-prefix/src/googletest/googletest/include/gtest/gtest.h:2233
#14 main (argc=<optimized out>, argv=0x7fff3e663168) at /home/amadio/build/root/googletest-prefix/src/googletest/googletest/src/gtest_main.cc:37
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum.
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  TList::RecursiveRemove (this=0x1e0d6c0, obj=0x7f75e4d04b40) at /home/amadio/src/root/core/cont/src/TList.cxx:717
#7  0x00007f75efe885b3 in THashList::RecursiveRemove (this=<optimized out>, obj=0x7f75e4d04b40) at /home/amadio/src/root/core/cont/src/THashList.cxx:286
#8  0x00007f75efe25047 in TObject::~TObject (this=0x7f75e4d04b40, __in_chrg=<optimized out>) at /home/amadio/src/root/core/base/src/TObject.cxx:88
#9  0x00007f75f0eef6a9 in TTree::~TTree (this=0x7f75e4d04b40, __in_chrg=<optimized out>) at /home/amadio/src/root/tree/tree/src/TTree.cxx:917
#10 0x00007f75efeba961 in TClass::Destructor (this=this
entry=0x1e81470, obj=obj
entry=0x7f75e4d04b40, dtorOnly=dtorOnly
entry=false) at /home/amadio/src/root/core/meta/src/TClass.cxx:5074
#11 0x00007f75f0758a38 in TFileMerger::MergeRecursive (this=0x7f75ebd75a90, target=0x7f75e4dd2780, sourcelist=0x7f75e4844ec0, type=14) at /home/amadio/src/root/io/io/src/TFileMerger.cxx:756
#12 0x00007f75f0756e37 in TFileMerger::PartialMerge (this=this
entry=0x7f75ebd75a90, in_type=in_type
entry=14) at /home/amadio/src/root/io/io/src/TFileMerger.cxx:842
#13 0x00007f75f073097e in ROOT::Experimental::TBufferMerger::WriteOutputFile (this=0x7fff3e662d00) at /home/amadio/src/root/io/io/src/TBufferMerger.cxx:90
#14 0x00007f75ef904d9e in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/libstdc++.so.6
#15 0x00007f75ef0f43f4 in start_thread (arg=0x7f75ebd7a700) at pthread_create.c:333
#16 0x00007f75eee3f65f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

===========================================================


Fatal in <TFileMerger::RecursiveRemove>: Output file of the TFile Merger (targeting tbuffermerger_parallel.root) has been deleted (likely due to a TTree larger than 100Gb)
aborting

@phsft-bot
Copy link

Build failed on slc6/gcc62.
See console output.

Errors:

  • ninja: error: loading 'build.ninja': No such file or directory

Warnings:

@pcanal
Copy link
Member

pcanal commented Jun 20, 2017

annoying :( ... the 'other' thread is already gone ....

@phsft-bot
Copy link

Build failed on centos7/gcc49.
See console output.

Errors:

  • /mnt/build/jenkins/workspace/root-pullrequests-build/root/core/meta/src/TClass.cxx:5486:27: error: ‘kFalse’ was not declared in this scope

Warnings:

@amadio amadio force-pushed the parallel-io-optimizations branch from a8281a6 to 0fa8ca8 Compare June 20, 2017 16:20
@amadio
Copy link
Member Author

amadio commented Jun 20, 2017

Fixed typo kFalse -> kFALSE.

@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

"no interpreter information for class %s is available even though it has a TClass initialization routine.",
fName.Data());
}
fCanLoadClassInfo = kFALSE;
Copy link
Member

@pcanal pcanal Jun 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be self consistent, we need

if (autoParse) fCanLoadClassInfo = kFALSE;

When auto parsing is disabled, the lookup answer is not authoritative ....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it not be loaded by other means as you said before? I'm confused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After SetClassInfo is done we can be in many state:

  1. fClassInfo is now a value, we are all good, no new to try reloading (independently of the auto-parsing flag)
  2. fClassInfo is still a nullptr but auto-parsing is enabled, there is no more information to be have anywhere, the AST decl for the class does not exist
  3. fClassInfo is still anullptr but auto-parsing is disabled, we now know the AST decl for the class is not in the AST but it might be brought into the AST if we were to retry with auto-parsing on (because the header declaring it might be parsed).

// If another thread executed LoadClassInfo at about the same time
// as this thread return early since the work was done.
if (!fCanLoadClassInfo) return;
bool AutoParse = !gInterpreter->IsAutoParsingSuspended();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming rule is that variable name start with a lower case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can change that.

// class was registered by a dictionary, but the info came from reading
// the pch.
// Note: This check avoids using AutoParse for classes in the pch!
if (fClassInfo) {
Copy link
Member

@Axel-Naumann Axel-Naumann Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not need this check anymore? Can we return early if fClassInfo (after making it a unique_ptr)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When fClassInfo is set, fCanLoadClassInfo is set to false, so this test is redundant with the one right above it. I could check for fClassInfo != nullptr before taking the lock, but fClassInfo is not atomic, while fCanLoadClassInfo is. Therefore, I use the latter exclusively for checking and returning early.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining @amadio And @pcanal bad naming :-) "can load" is the negation of "have loaded"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename the variable to fClassInfoLoaded or similar and use accordingly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Axel,

Well you may not have read the doc fully :). Orginally 'fCanLoadClassInfo' was really indicating whether there was a chance to be able to load the ClassInfo or not ... and it still does in a way.

At the moment when fCanLoadClassInfo is false, fClassInfo can be either a valid pointer or a nullptr, so "have loaded" would be inaccurate.

In other word, if fCanLoadClassInfo is true running the body of LoadClassInfo is likely to yield a change in the value of fClassInfo, when it is false (and whether or not we already ran it) there is no point in running the body of LoadClassInfo as it is guaranteed to not yield any change in the value of fClassInfo.

Cheers,
Philippe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Philippe!

@phsft-bot
Copy link

Build failed on mac1012/native.
See console output.

Warnings:

Failing tests:

@phsft-bot
Copy link

Build failed on slc6/gcc62.
See console output.

Warnings:

Failing tests:

@phsft-bot
Copy link

Build failed on centos7/gcc49.
See console output.

Warnings:

Failing tests:

@phsft-bot
Copy link

Build failed on ubuntu14/native.
See console output.

Warnings:

Failing tests:

@amadio amadio force-pushed the parallel-io-optimizations branch from eb98b89 to 3450731 Compare June 22, 2017 08:40
@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

@phsft-bot
Copy link

@phsft-bot
Copy link

Build failed on slc6/gcc49.
See console output.

Warnings:

Failing tests:

@phsft-bot
Copy link

Build failed on slc6/gcc62.
See console output.

Warnings:

Failing tests:

@amadio
Copy link
Member Author

amadio commented Jun 22, 2017

@pcanal Could you please take a look at this test? http://cdash.cern.ch/testDetails.php?test=26224986&build=367429 It could be failing because of my commit, right?

amadio added 2 commits June 22, 2017 12:23
Since fCanLoadClassInfo is atomic, the lock is not needed when checking
for it. Returning early without waiting when the class info is already
loaded yields significant performance improvements when writing in parallel.
@amadio amadio force-pushed the parallel-io-optimizations branch from 3450731 to a5441f4 Compare June 22, 2017 10:23
@phsft-bot
Copy link

Starting build on gcc49/centos7, native/mac1012, gcc49/slc6, gcc62/slc6, native/ubuntu14 with flags -Dvc=OFF -Dimt=ON -Dccache=ON
How to customize builds

@pcanal
Copy link
Member

pcanal commented Jun 22, 2017

i guess it could .... but I am not sure how .... [Besides, today's version is still sub-optimal, i.e. a WIP]

@amadio
Copy link
Member Author

amadio commented Jun 22, 2017

Yes, this is work in progress, I will make several more changes to TClass based on my analysis with VTune. The current situation is shown in the figure below.

screenshot

The wait time is still somewhat high, but the situation with only this PR's commits is still much better than it currently is on master.

@amadio
Copy link
Member Author

amadio commented Jun 22, 2017

@pcanal Jenkins passed. Should I merge this, or should we gather all changes first?

@pcanal
Copy link
Member

pcanal commented Jun 22, 2017

I am confused ... the PR contains VC change and only 2 changes in TClass ... and the one change is still introduced a decrease of the single-thread performance (missing if (fClassInfo) and/or removal of the fClassInfo!=nullptr && fCanLoadClassInfo.)

@amadio
Copy link
Member Author

amadio commented Jun 22, 2017

The commit for Vc/VecCore sneaked in from another branch. I added it just to shut up warnings because of how things are on master at the moment, but it's not part of this PR (i.e. I will remove it prior to merging). I don't see a decrease of single thread performance, did you measure it?

@pcanal
Copy link
Member

pcanal commented Jun 22, 2017

I don't see a decrease of single thread performance, did you measure it?

I know and have tried to explain it earlier.

There is a (quite) common case where a TClass is created (running TClass::Init) and ends up with fClassInfo != nullptr but fCanLoadClassInfo == true. In this commong case the removal of the
if (fClassInfo) return;
means that we do a useless (expensive) lookup (or at least series of function calls) for something already set and know. [On the other hand because that early return is NOT setting fCanLoadClassInfo it leads to LoadClassInfo being call often and the lock being taken]

We either need to re-add this if statement OR better yet make sure that all cases of fClassInfo!=nullptr also have fCanLoadClassInfo == false.

If you are still not convince, you can do the following test:
a) use the master
b) have the if (fStremearInfo) also set fCanLoadClassInfo to false.
and you will notice that the performance improvement is equal and/or similar to the one with your patch.

@amadio
Copy link
Member Author

amadio commented Jun 23, 2017

I am not sure that setting fCanLoadClassInfo to false whenever fClassInfo != nullptr will have no other side effects. I see at least a few places where fCanLoadClassInfo is used that could be affected by this change. TClass is too important to risk breaking, so at this point I think the best thing to do is to close this PR and to delegate making the right change to you, since you know the I/O system a lot better than me.

@amadio amadio closed this Jun 23, 2017
@amadio amadio deleted the parallel-io-optimizations branch June 23, 2017 06:42
@Axel-Naumann
Copy link
Member

Re-adding if (fClassInfo) return; (as Philippe suggested) sounded like the more effective path forward :-( No chance for you to have your patch not touch that if, and then ask roottest? The world would be a better place with this PR in!

@amadio
Copy link
Member Author

amadio commented Jun 23, 2017

@Axel-Naumann Well, @pcanal disagrees. Philippe wants to just add fCanLoadClassInfo = kFALSE; right after line 1413 of TClass.cxx, so that the calls to LoadClassInfo() which are guarded by if (fCanLoadClassInfo) are never made. However, I am not sure this change is safe yet.

That said, we surely need to fix this problem (given the difference seen in VTune), and I will keep working on a solution and will probably open a new pull request with either this change or similar changes after enough testing to ensure that everything is ok.

ellert added a commit to ellert/root that referenced this pull request Nov 17, 2022
…dition

          Start  672: tutorial-io-loopdir
 690/1156 Test  root-project#666: tutorial-io-double32 ................................................   Passed    2.42 sec
          Start  673: tutorial-io-loopdir11
 691/1156 Test  root-project#673: tutorial-io-loopdir11 ...............................................   Passed    0.76 sec
          Start  674: tutorial-io-mergeSelective
 692/1156 Test  root-project#672: tutorial-io-loopdir .................................................***Failed  Error regular expression found in output. Regex=[Error in <]  1.23 sec
Processing /builddir/build/BUILD/root-6.26.10/tutorials/io/loopdir.C...
Info in <TCanvas::Print>: ps file hsimple.ps has been created
Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps
Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps
Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps
Info in <TCanvas::Print>: ps file hsimple.ps has been closed
Error in <TPostScript::Text>: Cannot open temporary file: hsimple.ps_tmp_2089748
couet pushed a commit that referenced this pull request Nov 17, 2022
…dition (#11725)

Start  672: tutorial-io-loopdir
 690/1156 Test  #666: tutorial-io-double32 ................................................   Passed    2.42 sec
          Start  673: tutorial-io-loopdir11
 691/1156 Test  #673: tutorial-io-loopdir11 ...............................................   Passed    0.76 sec
          Start  674: tutorial-io-mergeSelective
 692/1156 Test  #672: tutorial-io-loopdir .................................................***Failed  Error regular expression found in output. Regex=[Error in <]  1.23 sec
Processing /builddir/build/BUILD/root-6.26.10/tutorials/io/loopdir.C...
Info in <TCanvas::Print>: ps file hsimple.ps has been created
Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps
Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps
Info in <TCanvas::Print>: Current canvas added to ps file hsimple.ps
Info in <TCanvas::Print>: ps file hsimple.ps has been closed
Error in <TPostScript::Text>: Cannot open temporary file: hsimple.ps_tmp_2089748
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants