-
Notifications
You must be signed in to change notification settings - Fork 8
Addressing the last few remaining places where randoms aren't using default_rng and/or a configurable seed #354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #354 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 38 38
Lines 2502 2582 +80
=========================================
+ Hits 2502 2582 +80
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These all look good, thanks for catching and fixing them! EDIT: Actually, does it matter if random_seed isn't explicitly in the init unpacking of the config parameters in two of them?
drewoldag
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok to me. Thank for taking care of it.
| # allow for either format for now | ||
| numzs = len(data[self.config.column_name]) | ||
| zmode = np.round(np.random.uniform(0.0, self.config.rand_zmax, numzs), 3) | ||
| rng = np.random.default_rng(seed=self.config.seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably fine, but it stood out a little. The goal here is that every time _process_chunk is called, it would use the same random value, not a new random value for each call to _process_chunk, correct?
| nobs = colordata.shape[0] | ||
| rng = np.random.default_rng | ||
| perm = rng().permutation(nobs) | ||
| rng = np.random.default_rng(seed=self.config.seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to update, but this file is using 0 as the default seed - just seems like we could do better 🤷
|
Maybe it makes sets to use the original seed + the chunk number, so each chunk gets a different seed, but in a deterministic way.
Or am I missing something?
…-e
On Apr 28, 2023, at 3:36 PM, Drew Oldag ***@***.***> wrote:
@drewoldag approved this pull request.
This looks ok to me. Thank for taking care of it.
In src/rail/estimation/algos/randomPZ.py <https://github.com/LSSTDESC/RAIL/pull/354#discussion_r1180858550>:
> @@ -35,7 +36,8 @@ def _process_chunk(self, start, end, data, first):
pdf = []
# allow for either format for now
numzs = len(data[self.config.column_name])
- zmode = np.round(np.random.uniform(0.0, self.config.rand_zmax, numzs), 3)
+ rng = np.random.default_rng(seed=self.config.seed)
This is probably fine, but it stood out a little. The goal here is that every time _process_chunk is called, it would use the same random value, not a new random value for each call to _process_chunk, correct?
In src/rail/estimation/algos/knnpz.py <https://github.com/LSSTDESC/RAIL/pull/354#discussion_r1180859269>:
> @@ -106,8 +106,8 @@ def run(self):
trainszs = np.array(training_data[self.config.redshift_column_name])
colordata = _computecolordata(knndf, self.config.ref_column_name, self.config.column_names)
nobs = colordata.shape[0]
- rng = np.random.default_rng
- perm = rng().permutation(nobs)
+ rng = np.random.default_rng(seed=self.config.seed)
No need to update, but this file is using 0 as the default seed - just seems like we could do better 🤷
—
Reply to this email directly, view it on GitHub <https://github.com/LSSTDESC/RAIL/pull/354#pullrequestreview-1406664324>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRIGIQKHOM7XC7XKPAGCALXDRBADANCNFSM6AAAAAAXPVAKIY>.
You are receiving this because you are subscribed to this thread.
|
|
I hadn't even thought of the chunk number, that's a good point, Eric. I think you're right, in our current setup, if we have the numpy.random.default_rng(seed=self.config.seed) in the process_chunk function, then it will reset to the same seed at the start of each chunk, setting the seed to seed + chunknum fixes that. I guess the other option would be to set the rng in the init as self.rng so that it's not reset at each call of process_chunk but rather only during the init. |
|
Checking through things in RAIL, the only parallelized estimator that uses an rng in I'll look through GPz_v1, FlexZBoost, Delight, and BPZ_Lite now to see if I need to make any changes on those repos. |
addressing #41, I looked for instances where we were not using numpy.random.default_rng, and/or were not setting the seed from a config parameter for the stage. With all stages that use a random number generator, using default_rng and a configurable seed should allow us to isolate the effects of the rng to the particular stage, and easily change via the config parameter to test effects of the randoms on stage performance.