Skip to content

Conversation

@vgvassilev
Copy link
Member

If we call TCling::Load on an already loaded library we dlclose and dlopen the library. However, currently we do not have a facility to 'reload'/undo the effect of TCling::LoadPCM.

This patch adds a section in the dictionary for de-registration which is reverse to TCling::RegisterModule -- TCling::UnRegisterModule. It tracks down which library is being reloaded and does not trigger a re-read of the rdict pcm. The current stub is can be further expanded to undo the effects caused by TCling::RegisterModule and/or improve the TCling shutdown by running the interpreter-dependent shutdown of the dictionary.

The intent of this patch is to fix the failing OSX tests with

Error in TCling::LoadPCM: ROOT PCM /.../libTree_rdict.pcm file does not exist
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImageGui_rdict.pcm
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImage_rdict.pcm

@vgvassilev vgvassilev requested a review from pcanal as a code owner April 21, 2020 13:52
@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos7-multicore/default, ROOT-fedora29/python3, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1015/cxx17, windows10/cxx14
How to customize builds

" ~DictInit() {\n"
" assert (isInitialized);"
" TROOT::UnRegisterModule(\"" << demangledDictName << "\","
"TriggerDictionaryInitialization_" << dictName << "_Impl);\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that this is misleading (albeit harmeless) as you would a want a UnTrigger function here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use that function to find back the shared object. We can avoid that by keeping a mapping in TCling between modulename and shared object. Then this function will take only one argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough. thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I make that change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that (but feel free not too if you prefer to leave it as is).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcanal, could you provide an implementation of TCling::UnLoadPCM?

@vgvassilev
Copy link
Member Author

@phsft-bot build with flags -DCTEST_TEST_EXCLUDE_NONE=On

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos7-multicore/default, ROOT-fedora29/python3, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1015/cxx17, windows10/cxx14 with flags -DCTEST_TEST_EXCLUDE_NONE=On
How to customize builds

@phsft-bot
Copy link

Build failed on mac1015/cxx17.
Running on macphsft18.dyndns.cern.ch:/Users/sftnight/build/jenkins/workspace/root-pullrequests-build
See cdash.
See console output.

Errors:

  • [2020-04-21T18:34:31.074Z] FAILED: tree/treeplayer/G__TreePlayer.cxx lib/TreePlayer.pcm

@phsft-bot
Copy link

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See cdash.
See console output.

Errors:

  • [2020-04-21T18:45:11.191Z] C:\build\workspace\root-pullrequests-build\build\core\G__Core.cxx(27522,9): error C2065: 'isInitialized': undeclared identifier [C:\build\workspace\root-pullrequests-build\build\core\G__Core.vcxproj]

@phsft-bot
Copy link

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See cdash.
See console output.

@phsft-bot
Copy link

Build failed on ROOT-fedora29/python3.
Running on root-fedora29-2.cern.ch:/build/workspace/root-pullrequests-build
See cdash.
See console output.

vgvassilev added a commit to vgvassilev/root that referenced this pull request Apr 21, 2020
When a library is re-loaded (via .L) we should reload the corresponding rdict
file. However current ROOT (without modules) just re-reads the rdict file.

This patch brings this behavior for modules and is intended to be a quick fix
for the os nightly failures of kind:

Error in TCling::LoadPCM: ROOT PCM /.../libTree_rdict.pcm file does not exist
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImageGui_rdict.pcm
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImage_rdict.pcmError in TCling::LoadPCM: ROOT PCM /.../libTree_rdict.pcm file does not exist
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImageGui_rdict.pcm
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImage_rdict.pcm

The continuation of this work is in root-project#5420
@pcanal
Copy link
Member

pcanal commented Apr 21, 2020

If we call TCling::Load on an already loaded library we dlclose and dlopen the library.

Do we still distinguish (as I think we ought to) between the case .L which might reload the library (probably 'only' if it changed) and the case gSystem->Load which should not unload the library (and for that matter I would also not really expect .L name.so to reload the library (by .L script.C yes and .L script.C+ yes if the library was recreated).

vgvassilev added a commit that referenced this pull request Apr 22, 2020
When a library is re-loaded (via .L) we should reload the corresponding rdict
file. However current ROOT (without modules) just re-reads the rdict file.

This patch brings this behavior for modules and is intended to be a quick fix
for the os nightly failures of kind:

Error in TCling::LoadPCM: ROOT PCM /.../libTree_rdict.pcm file does not exist
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImageGui_rdict.pcm
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImage_rdict.pcmError in TCling::LoadPCM: ROOT PCM /.../libTree_rdict.pcm file does not exist
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImageGui_rdict.pcm
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImage_rdict.pcm

The continuation of this work is in #5420
@pcanal
Copy link
Member

pcanal commented Apr 22, 2020

The intent of this patch is to fix the failing OSX tests with

Can you remind me why:
a) it only fails on the MacOS node?
b) it (seems to) not be reproduce-able on my own Mac?

@pcanal
Copy link
Member

pcanal commented Apr 22, 2020

On the jenkins build node the error appears likely because of '-Dsoversion="On"' and the following. One test that fails is 'runload.C' in roottest/root/treeformula/sync. There we have (essentially):

 {
     gROOT->ProcessLine(".L loadcode.C+");
      ...
      TTree *t; _file0->GetObject("t",t);
}

The use of TTree means that before executing the script libTree is laoded.
During the execution of CompileMacro, the depend library are loaded (see for example the collection line 3363 .
And in this case the result is:

loadcode_C.so libTree.6.21.01.so libTree.so

which coupled with the new feature that gSystem->Load first dlclose the library then reopen it means that libTree is dlclose 2 or 3 times (one for both the versioned and unversioned version).

This behavior is a very significant departure from the existing behavior (where, because the libraries were added to the link line, the library were never reloaded).

Reloading arbitrary depend library is never a good idea since they (are likely to) include static object that may do things that are essential to do only once (initialization, connection to db).

Technically CompileMacro could (as it does elsewhere) first check if the library is loaded.

But still several questions:

  1. Why is libTree listed twice in the list of depend library?

  2. When should gSystem->Load automatically reload a library, if ever?

@Axel-Naumann @vgvassilev opinions?

PS. Even-though the set of test would be fixed by updating gSystem->Load and/or CompileMacro, the PR is still likely necessary for the case where CompileMacro generates a pcm (and thus need to support reload) or the case where the user explicitly unload a library.

@vgvassilev vgvassilev force-pushed the fix_osx_reload_rdicts branch from 03acedc to afe3d65 Compare April 23, 2020 11:45
@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos7-multicore/default, ROOT-fedora29/python3, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1015/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on mac1015/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See cdash.
See console output.

Errors:

  • [2020-04-23T11:49:56.659Z] CMake Error at googletest-stamp/googletest-download-Release.cmake:49 (message):

@vgvassilev
Copy link
Member Author

The intent of this patch is to fix the failing OSX tests with

Can you remind me why:
a) it only fails on the MacOS node?
b) it (seems to) not be reproduce-able on my own Mac?

Because ROOT need to be built with -Dsoversion=On.

@vgvassilev
Copy link
Member Author

This behavior is a very significant departure from the existing behavior (where, because the libraries were added to the link line, the library were never reloaded).

Reloading arbitrary depend library is never a good idea since they (are likely to) include static object that may do things that are essential to do only once (initialization, connection to db).

Technically CompileMacro could (as it does elsewhere) first check if the library is loaded.

But still several questions:

1. Why is libTree listed twice in the list of depend library?

The libTree.6.21.01.so libTree.so are the same library -- one is linked to the other (forgot which way it was). The dependent symbol scanner currently does not work well with symlinks. This PR fixes it: #4717

2. When should gSystem->Load automatically reload a library, if ever?

If we decide that it won't ever reload (which I do not see issues with) then we should reimplement the calls to gInterpreter->Load as it translates them to .L libName which reloads. Alternatively, we could check if the library changed and only reload it if it changed.

@Axel-Naumann @vgvassilev opinions?

PS. Even-though the set of test would be fixed by updating gSystem->Load and/or CompileMacro, the PR is still likely necessary for the case where CompileMacro generates a pcm (and thus need to support reload) or the case where the user explicitly unload a library.

@Axel-Naumann
Copy link
Member

As TSystem::Load() says: // don't load libraries that have already been loaded - so I guess we should be using that interface?

@pcanal
Copy link
Member

pcanal commented Apr 24, 2020

The libTree.6.21.01.so libTree.so are the same library

And it would not (and should not) matter except that it currently leads to the library being reloaded yet an additional time (for every macro that is loaded and uses TTree!!!)

If we decide that it won't ever reload (which I do not see issues with)

I think TSystem::Load should behave the same as dlopen (once upon a time it was a thin layer around it).

we should reimplement the calls to gInterpreter->Load as it translates them to .L libName which reloads. Alternatively, we could check if the library changed and only reload it if it changed.

There a case to be made for returning TSystem::Load to never reload while keeping gInterpreter->Load and ".L" being the same (so TSystem::Load and gInterpreter->Load behaving differently).

In my opinion ".L ...so" (and thus gInterpreter->Load) have several possibilities but they are indeed intimately link to the behavior of ".L ....C" and ".L ...C+", so exhaustive list:

(1.a) never reload *.so, never reload *.C and *.C+
(1.b) never reload *.so only reload *.C and *.C+ when changed
(1.c) never reload *.so always reload *.C only reload *.C+ when changed (I think this was v5 behavior)
(1.d) never reload *.so, always reload *.C and *.C+

(2.a) only reload *.so when changed, never reload *.C and *.C+
(2.b) only reload *.so when changed only reload *.C and *.C+ when changed
(2.c) only reload *.so when changed always reload *.C only reload *.C+ when changed
(2.d) only reload *.so when changed, always reload *.C and *.C+

(3.a) always reload *.so, never reload *.C and *.C+
(3.b) always reload *.so only reload *.C and *.C+ when changed
(3.c) always reload *.so always reload *.C only reload *.C+ when changed
(3.d) always reload *.so, always reload *.C and *.C+ (current behavior if I understood correctly)

Note: CompileMacro was implemented with the "only reload *.C+ when changed in mind).

(2.b) has the minimal amount of unloading but consequently has an 'unpredictable' number of static initialization.

So (1.b) might be a better alternative if we assume that user macro are unlikely to have static initialization that matter/must-be-exact (and if it does then the user can use explicit .U I guess).

Because ".L .so" might be used to load 'global' (for example Athena's or CMSSW's) libraries, I find (3.) too harsh.

@phsft-bot
Copy link

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-4.cern.ch:/build/workspace/root-pullrequests-build
See cdash.
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on ROOT-fedora29/python3.
Running on root-fedora29-3.cern.ch:/build/workspace/root-pullrequests-build
See cdash.
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See cdash.
See console output.

Failing tests:


void TCling::UnLoadPCMImpl(const TFile& rdict)
{
// FIXME: Reverse the effect of LoadPCMImpl.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcanal, I have put all things in place except for this ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcanal, the fate of this PR depends on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcanal, I see you were working in a similar area, maybe you can also look into this and we can land this PR.

@vgvassilev
Copy link
Member Author

@phsft-bot build!

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos7-multicore/default, ROOT-fedora29/python3, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1015/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on mac1015/cxx17.
Running on macphsft18.dyndns.cern.ch:/Users/sftnight/build/jenkins/workspace/root-pullrequests-build
See cdash.
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on ROOT-fedora30/cxx14.
Running on root-fedora30-2.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See cdash.
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-4.cern.ch:/build/workspace/root-pullrequests-build
See cdash.
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See cdash.
See console output.

Failing tests:

@phsft-bot
Copy link

Build failed on ROOT-fedora29/python3.
Running on root-fedora29-2.cern.ch:/build/workspace/root-pullrequests-build
See cdash.
See console output.

Failing tests:

@hageboeck hageboeck marked this pull request as draft September 14, 2020 15:28
@hageboeck
Copy link
Member

Converted to draft because "work in progress" label will be discarded.

osschar pushed a commit to osschar/root that referenced this pull request Dec 21, 2020
When a library is re-loaded (via .L) we should reload the corresponding rdict
file. However current ROOT (without modules) just re-reads the rdict file.

This patch brings this behavior for modules and is intended to be a quick fix
for the os nightly failures of kind:

Error in TCling::LoadPCM: ROOT PCM /.../libTree_rdict.pcm file does not exist
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImageGui_rdict.pcm
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImage_rdict.pcmError in TCling::LoadPCM: ROOT PCM /.../libTree_rdict.pcm file does not exist
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImageGui_rdict.pcm
Info in TCling::LoadPCM: In-memory ROOT PCM candidate /.../libASImage_rdict.pcm

The continuation of this work is in root-project#5420
@vgvassilev
Copy link
Member Author

@pcanal, ping.

…oaded.

If we call TCling::Load on an already loaded library we dlclose and dlopen the
library. However, currently we do not have a facility to 'reload'/undo the
effect of TCling::LoadPCM.

This patch adds a section in the dictionary for de-registration which is reverse
to TCling::RegisterModule -- TCling::UnRegisterModule. It tracks down which
library is being reloaded and does not trigger a re-read of the rdict pcm. The
current stub is can be further expanded to undo the effects caused by
TCling::RegisterModule and/or improve the TCling shutdown by running the
interpreter-dependent shutdown of the dictionary.
@vgvassilev vgvassilev force-pushed the fix_osx_reload_rdicts branch from 17d667d to a21a9ef Compare April 2, 2021 08:07
@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

@phsft-bot
Copy link

Build failed on mac1014/python3.
Running on macphsft17.dyndns.cern.ch:/build/jenkins/workspace/root-pullrequests-build
See console output.

Failing tests:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants