Skip to content

Fix/289 improve sw lw radiation omp scaling#363

Open
jirudaya wants to merge 15 commits intoMetOffice:mainfrom
jirudaya:fix/289-improve-sw-lw-radiation-omp-scaling
Open

Fix/289 improve sw lw radiation omp scaling#363
jirudaya wants to merge 15 commits intoMetOffice:mainfrom
jirudaya:fix/289-improve-sw-lw-radiation-omp-scaling

Conversation

@jirudaya
Copy link

@jirudaya jirudaya commented Mar 11, 2026

PR Summary

Sci/Tech Reviewer: Benjamin Went (@MetBenjaminWent)
Code Reviewer: Lottie Turner (@mo-lottieturner)

  • Fixed the transmute scripts for sw_kernel_mod and lw_kernel_mod. The manually added OMP pragmas are now re-added without transformation errors.
  • Added parameters to limit segment lengths for the threading call to Socrates. Tested with a limit of 32, which provided improved thread scaling.
  • Tested by running rose-stem locally with cylc run on ARCHER2.

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings
  • All automated checks in the CI pipeline have completed successfully

Testing

  • I have tested this change locally, using the LFRic Apps rose-stem suite
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

trac.log

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable)
  • Authentication and authorisation are properly implemented (if applicable)

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

  • I understand this area of code and the changes being added
  • The proposed changes correspond to the pull request description
  • Documentation is sufficient (do documentation papers need updating)
  • Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

@github-actions github-actions bot added cla-required The CLA has not yet been signed by the author of this PR - added by GA cla-signed The CLA has been signed as part of this PR - added by GA and removed cla-required The CLA has not yet been signed by the author of this PR - added by GA labels Mar 11, 2026
Copy link
Contributor

@iboutle iboutle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An upgrade macro is required to set the default values of sw_segment_limit and lw_segment_limit - I presume setting these to 0 would replicate current trunk behaviour, although if we think it's more optimal to set them to 32 from the outset then that's also fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think any future updates to the argument lists in lw_kernel_mod.F90 and sw_kernel_mod.F90 might now require updates to the "ignore_dependencies_for" list in lw_kernel_mod.py and sw_kernel_mod.py. If this is the case, could you add comments in lw_kernel_mod.F90 and sw_kernel_mod.F90 so that developers know what would need to be changed.

Copy link
Contributor

@thomasmelvin thomasmelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@jirudaya jirudaya force-pushed the fix/289-improve-sw-lw-radiation-omp-scaling branch from 2cedd7d to 260203a Compare March 12, 2026 13:59
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good.

Copy link
Contributor

@iboutle iboutle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - I'm not a fan of this change - I'd prefer it be left how it was, which was consistent with the other segment size handling for other schemes, i.e. the namelist should determine the segment size, with 0 defaulting to the entire MPI rank. Please can you revert this - thanks! (Edit - not sure if it's obvious, but this comment refers to commit 9536736)

Copy link
Contributor

@iboutle iboutle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - this should be done via an upgrade macro, rather than by editing the app configuraions by hand. Please can you revert this and add an appropriate upgrade macro. Thanks (Edit - not sure if it's obvious, but this comment refers to commit cce8025)

Also, perfectly happy for the upgrade macro to set the values to 32, if that's the current best number!

@iboutle
Copy link
Contributor

iboutle commented Mar 17, 2026

Sorry, I think I'm confusing you, but also getting confused myself!

I initially made a mistake when suggesting changes to the upgrade macro tags between the PR (363) and issue (289). I fixed this, but I don't think before you accepted the changes - but I think the number should be 363 in these rather than 289.

What I was trying to correct was that the macro tags should show that this is a macro applied to vn3.1, moving it up to vn3.1_t363. However, I can now see that the branch itself isn't at vn3.1, but instead is at an older version of the code. It may be best if you merge stable (vn3.1) into your branch (see instructions here https://github.com/MetOffice/simulation-systems/wiki/2026.03.1) - then you should see that the versions.py file is empty, apart from your single change. This will make it clearer what the upgrade macro is actually going to do!

@jirudaya
Copy link
Author

Sorry, I think I'm confusing you, but also getting confused myself!

I initially made a mistake when suggesting changes to the upgrade macro tags between the PR (363) and issue (289). I fixed this, but I don't think before you accepted the changes - but I think the number should be 363 in these rather than 289.

What I was trying to correct was that the macro tags should show that this is a macro applied to vn3.1, moving it up to vn3.1_t363. However, I can now see that the branch itself isn't at vn3.1, but instead is at an older version of the code. It may be best if you merge stable (vn3.1) into your branch (see instructions here https://github.com/MetOffice/simulation-systems/wiki/2026.03.1) - then you should see that the versions.py file is empty, apart from your single change. This will make it clearer what the upgrade macro is actually going to do!

I made a mistake of creating the branch from main instead of stable, hence the version.py is not empty as expected. I will rebase the branch to fix this

jirudaya and others added 15 commits March 17, 2026 16:16
Add ignore_dependencies_for to sw_kernel_mod.py to prevent
TransformationErrors and ensure OpenMP pragmas are re-added
during PSyclone transmutation.

Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
Add ignore_dependencies_for to lw_kernel_mod.py to prevent
TransformationErrors and ensure OpenMP pragmas are re-added
during PSyclone transmutation.

Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
Limit the number of columns per SW segment using sw_seg_limit
to improve OpenMP load balancing and prevent oversized blocks.

Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
Limit the number of columns per LW segment using lw_seg_limit
to improve OpenMP load balancing and prevent oversized blocks.

Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
- add parameters to tuning_segement_mod and initialize in um_sizes_init_mod

Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
Add a note to specify changes may be needed in the transmute script in the future

Co-authored-by: James Manners <james.manners@metoffice.gov.uk>
Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
This reverts commit 9536736.

Signed-off-by: jirudaya <158197464+jirudaya@users.noreply.github.com>
Fix macro class version
@jirudaya jirudaya force-pushed the fix/289-improve-sw-lw-radiation-omp-scaling branch from f2e3392 to 19853e3 Compare March 17, 2026 16:30
Copy link
Contributor

@iboutle iboutle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now!

@DanStoneMO
Copy link
Contributor

I tested using a local copy of the branch brought up to date, but I think this one will need a linked JEDI PR. I will link it here once that's ready

@DanStoneMO DanStoneMO added the Linked Jedi This PR is linked to a Jedi PR - this will be managed by the DA team label Mar 17, 2026
Copy link
Contributor

@DanStoneMO DanStoneMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a re-test following a KGO change yesterday, and JEDI now works with this as-is. No linked PR will be needed. A surprise to be sure, but a welcome one.

@DanStoneMO DanStoneMO removed the Linked Jedi This PR is linked to a Jedi PR - this will be managed by the DA team label Mar 18, 2026
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jaffrey, initial changes broadly look good, however I did have some comments, thanks!

gw_segment=32
ls_ppn_segment=32
ussp_segment=4
sw_segment_limit=32
lw_segment_limit=32

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for sw_segment_limit and lw_segment_limit, the macros should sort this out when the ticket is committed.
These should be handled by the macros, and removed from this file, thanks!

! The maximum segment size is limited by lw_seg_limit_size to prevent
! overly large blocks. This ensures better load balancing across threads.
ncols_per_thread = (n_profile_list + max_threads - 1) / max_threads
nblocks = ((ncols_per_thread + lw_seg_limit_size - 1) / lw_seg_limit_size) &

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we explicitly round these, depending on our intent?

L592 and 593, where we are dividing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And L595?

ncols_per_thread = (n_profile_list + max_threads - 1) / max_threads
nblocks = ((ncols_per_thread + sw_seg_limit_size - 1) / sw_seg_limit_size) &
* max_threads
soc_sw_block = (n_profile_list + nblocks - 1) / nblocks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as lw_kernel_mod comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How comes we are making these changes with this PR? I don't see any content in the main ticket details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The CLA has been signed as part of this PR - added by GA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lfric_atm: Improve OpenMP thread scaling for SW and LW radiation kernels

7 participants