Addressing the last few remaining places where randoms aren't using default_rng and/or a configurable seed #354

sschmidt23 · 2023-04-28T20:08:36Z

addressing #41, I looked for instances where we were not using numpy.random.default_rng, and/or were not setting the seed from a config parameter for the stage. With all stages that use a random number generator, using default_rng and a configurable seed should allow us to isolate the effects of the rng to the particular stage, and easily change via the config parameter to test effects of the randoms on stage performance.

…se seed

codecov · 2023-04-28T20:18:15Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (82da866) 100.00% compared to head (6e4ccc4) 100.00%.

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #354   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           38        38           
  Lines         2502      2582   +80     
=========================================
+ Hits          2502      2582   +80

Flag	Coverage Δ
unittests	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/rail/creation/degradation/grid_selection.py	`100.00% <100.00%> (ø)`
src/rail/estimation/algos/knnpz.py	`100.00% <100.00%> (ø)`
src/rail/estimation/algos/randomPZ.py	`100.00% <100.00%> (ø)`

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

aimalz

These all look good, thanks for catching and fixing them! EDIT: Actually, does it matter if random_seed isn't explicitly in the init unpacking of the config parameters in two of them?

drewoldag

This looks ok to me. Thank for taking care of it.

drewoldag · 2023-04-28T22:28:14Z

src/rail/estimation/algos/randomPZ.py

        # allow for either format for now
        numzs = len(data[self.config.column_name])
-        zmode = np.round(np.random.uniform(0.0, self.config.rand_zmax, numzs), 3)
+        rng = np.random.default_rng(seed=self.config.seed)


This is probably fine, but it stood out a little. The goal here is that every time _process_chunk is called, it would use the same random value, not a new random value for each call to _process_chunk, correct?

drewoldag · 2023-04-28T22:29:24Z

src/rail/estimation/algos/knnpz.py

        nobs = colordata.shape[0]
-        rng = np.random.default_rng
-        perm = rng().permutation(nobs)
+        rng = np.random.default_rng(seed=self.config.seed)


No need to update, but this file is using 0 as the default seed - just seems like we could do better 🤷

eacharles · 2023-04-28T22:39:18Z

Maybe it makes sets to use the original seed + the chunk number, so each chunk gets a different seed, but in a deterministic way. Or am I missing something?

…

-e

On Apr 28, 2023, at 3:36 PM, Drew Oldag ***@***.***> wrote: @drewoldag approved this pull request. This looks ok to me. Thank for taking care of it. In src/rail/estimation/algos/randomPZ.py <https://github.com/LSSTDESC/RAIL/pull/354#discussion_r1180858550>: > @@ -35,7 +36,8 @@ def _process_chunk(self, start, end, data, first): pdf = [] # allow for either format for now numzs = len(data[self.config.column_name]) - zmode = np.round(np.random.uniform(0.0, self.config.rand_zmax, numzs), 3) + rng = np.random.default_rng(seed=self.config.seed) This is probably fine, but it stood out a little. The goal here is that every time _process_chunk is called, it would use the same random value, not a new random value for each call to _process_chunk, correct? In src/rail/estimation/algos/knnpz.py <https://github.com/LSSTDESC/RAIL/pull/354#discussion_r1180859269>: > @@ -106,8 +106,8 @@ def run(self): trainszs = np.array(training_data[self.config.redshift_column_name]) colordata = _computecolordata(knndf, self.config.ref_column_name, self.config.column_names) nobs = colordata.shape[0] - rng = np.random.default_rng - perm = rng().permutation(nobs) + rng = np.random.default_rng(seed=self.config.seed) No need to update, but this file is using 0 as the default seed - just seems like we could do better 🤷 — Reply to this email directly, view it on GitHub <https://github.com/LSSTDESC/RAIL/pull/354#pullrequestreview-1406664324>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRIGIQKHOM7XC7XKPAGCALXDRBADANCNFSM6AAAAAAXPVAKIY>. You are receiving this because you are subscribed to this thread.

sschmidt23 · 2023-04-28T23:04:09Z

I hadn't even thought of the chunk number, that's a good point, Eric. I think you're right, in our current setup, if we have the numpy.random.default_rng(seed=self.config.seed) in the process_chunk function, then it will reset to the same seed at the start of each chunk, setting the seed to seed + chunknum fixes that. I guess the other option would be to set the rng in the init as self.rng so that it's not reset at each call of process_chunk but rather only during the init.

sschmidt23 · 2023-05-02T21:55:08Z

Checking through things in RAIL, the only parallelized estimator that uses an rng in _process_chunk is randomPZ, I changed the seed initialization to seed = self.config.seed + start. All other uses of a random number generator are in non-parallelized functions, and thus do not need this addition. However, any future parallelizations that have a random number used in a chunked function will have to similarly initialize (e.g. the open PR on somocluSOM, I'll push a change to that branch now as well).

I'll look through GPz_v1, FlexZBoost, Delight, and BPZ_Lite now to see if I need to make any changes on those repos.

replace np.random with default_rng in a couple places, make sure to u…

cbd7302

…se seed

sschmidt23 linked an issue Apr 28, 2023 that may be closed by this pull request

managing random seeds uniformly #41

Closed

sschmidt23 requested review from aimalz and drewoldag April 28, 2023 20:23

aimalz approved these changes Apr 28, 2023

View reviewed changes

drewoldag approved these changes Apr 28, 2023

View reviewed changes

add start to randomPZ random seed

6e4ccc4

sschmidt23 merged commit a015dce into main May 3, 2023

sschmidt23 deleted the issue/41/rando branch May 3, 2023 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Addressing the last few remaining places where randoms aren't using default_rng and/or a configurable seed #354

Addressing the last few remaining places where randoms aren't using default_rng and/or a configurable seed #354

Uh oh!

sschmidt23 commented Apr 28, 2023

Uh oh!

codecov bot commented Apr 28, 2023 •

edited

Loading

Uh oh!

aimalz left a comment •

edited

Loading

Uh oh!

drewoldag left a comment

Uh oh!

drewoldag Apr 28, 2023

Uh oh!

drewoldag Apr 28, 2023

Uh oh!

eacharles commented Apr 28, 2023 via email

Uh oh!

sschmidt23 commented Apr 28, 2023 •

edited

Loading

Uh oh!

sschmidt23 commented May 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Addressing the last few remaining places where randoms aren't using default_rng and/or a configurable seed #354

Addressing the last few remaining places where randoms aren't using default_rng and/or a configurable seed #354

Uh oh!

Conversation

sschmidt23 commented Apr 28, 2023

Uh oh!

codecov bot commented Apr 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

aimalz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drewoldag left a comment

Choose a reason for hiding this comment

Uh oh!

drewoldag Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

drewoldag Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

eacharles commented Apr 28, 2023 via email

Uh oh!

sschmidt23 commented Apr 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sschmidt23 commented May 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Apr 28, 2023 •

edited

Loading

aimalz left a comment •

edited

Loading

sschmidt23 commented Apr 28, 2023 •

edited

Loading