-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[cxxmodules] Implement global module indexing to improve performance. #4016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cxxmodules] Implement global module indexing to improve performance. #4016
Conversation
|
Starting build on |
b6a2f4c to
167d978
Compare
|
Starting build on |
|
Build failed on ROOT-ubuntu16/rtcxxmod. Errors:
Warnings:
|
|
Build failed on ROOT-performance-centos7-multicore/default. Warnings:
Failing tests:
|
|
Build failed on ROOT-fedora29/python3. Warnings:
Failing tests:
|
|
Build failed on ROOT-fedora27/noimt. Warnings:
Failing tests:
|
167d978 to
4363501
Compare
|
Starting build on |
|
Starting build on |
01c1356 to
a2dea2b
Compare
|
Starting build on |
|
@phsft-bot build just on ROOT-ubuntu16/rtcxxmod with flags -DCTEST_TEST_EXCLUDE_NONE=On |
|
Starting build on |
|
Build failed on ROOT-ubuntu16/rtcxxmod. Errors:
And 15 more Failing tests:
And 119 more |
|
Starting build on |
|
Build failed on ROOT-ubuntu16/rtcxxmod. Errors:
|
|
Build failed on ROOT-fedora30/cxx14. Failing tests: |
|
Build failed on ROOT-ubuntu18.04-i386/cxx14. Failing tests: |
The global module index represents an efficient on-disk hash table which stores identifier->module mapping. Every time clang finds a unknown identifier we are informed and we can load the corresponding module on demand. This way we can provide minimal set of loaded modules. Currently, we see that for hsimple.C only the half of the modules are loaded. This can be further improved because we currently load all modules which have an identifier, that is when looking for (for example TPad) we will load all modules which have the identifier TPad, including modules which contain only a forward declaration of it. Kudos Arpitha Raghunandan (arpi-r)!
In cases where we have the module cache path and prebuilt module path pointing to the same location, the FileManager should not cache the module file lookup failure because it may currently be building the module. This patch is necessary because the global module index is built from the module cache path and it is loaded from the prebuilt module path. In a full explicit or implicit module build infrastructure this is not a problem. However, in a mixed scenario (where modules for third-party dependencies are built implicitly) such as ours this is problematic. One of the reasons is that we cannot configure the prebuilt modules paths or module cache paths externally. This is because the interpreter (at construction time) #includes RuntimeUniverse which may trigger module build. This patch allows us to refactor some of the code working around this issue.
When we are generating code, CodeGen automatically tries to complete decl's redeclaration chain. This ends up a call to the external sources, one of which is our global module index (GMI). The semantics of the GMI is to aid the frontend, that is to automatically import missing modules based on the typed *by the user* identifiers. It does not intend to aid the code generation by any means. Currently this happens when completing chains for existing identifiers such as 'volume' which happens to be an indentifier part of TMVA.pcm, too. Thus, CodeGen unintentionally loads the module of TMVA. This is an overkill, but worse -- it brings recursiveness to the identifier resolution system. This patch disables looking in the GMI at codegen time.
All reasibility checks are done in the routine itself and we also call it, when we need to complete type through LookupObject(TagDecl*).
The issue is that, it does not contain the identifier for TH1.
It does not contain identifier TFrame
Fixes some failing tests
This does not require resolving conflicts or bloating the history.
We do that to fix the: +Warning in <TClass::Init>: no dictionary for class TLorentzVector is available errors. This is due to the fact that TLorentzVector and such identifiers do not get to the module.idx. They are read in ::readIndex and visited in ::writeIndex however it seems that the on-disk writer trait is filtering somehow them out.
This is needed due to see this commit: fc24e98
By setting the ROOT_EXPERIMENTAL_GMI env variable. This is useful for comparing performance results from both implementations. ROOT_EXPERIMENTAL_GMI is temporary and will go away as soon as we switch to the GMI by default.
7d4aee1 to
774c08f
Compare
|
Starting build on |
|
Build failed on mac1014/cxx17. Failing tests: |
|
Build failed on ROOT-fedora30/cxx14. Failing tests: |
|
Build failed on ROOT-fedora27/noimt. |
|
Build failed on ROOT-ubuntu18.04-i386/cxx14. Failing tests: |
|
The test failures are not related to this PR, they are present in the nightly builds. |
The global module index represents an efficient on-disk hash table which stores identifier->module mapping. Every time clang finds a unknown identifier we are informed and we can load the corresponding module on demand.
This way we can provide minimal set of loaded modules. Currently, we see that for hsimple.C only the half of the modules are loaded. This can be further improved because we currently load all modules which have an identifier, that is when looking for (for example TPad) we will load all modules which have the identifier TPad, including modules which contain only a forward declaration of it.
Kudos Arpitha Raghunandan (@arpi-r)!
We still need some performance measurements but the preliminary results are promising.
Performance
Methodology
We have a forwarding root.exe which essentially calls /usr/bin/time -v root.exe $@. We have processed and stored this information in csv files. We have run in three modes:
Run on
Ubuntu 18.10 on Intel® Core™ i5-8250U CPU @ 1.60GHz × 8Results Interpretation
A general comparison between 2) and 3) show that this PR makes ROOT about 3% faster and 25% more memory efficient.
A general comparison between 1) and 3) shows that modules are still less efficient in a few cases which is expected because the PR loads more modules than it should. This will be addressed in subsequent PRs. A good trend is that some test already show that 3) is better than 1).
The raw data could be found here. [work was done by Arpitha Raghunandan (@arpi-r)]
Depends on #4005.